Lately the internet is becoming more and more polluted with content farms. Some people consider sites like eHow to be a content farm but at least their content is still created by humans. I’m not saying that eHow is a good site by any means but personally I’m mostly concerned with auto-generated content farms.
These “farms” usually consist of several hundred pages of meaningless text and images that some bot created. The bots scrape content from other sites on the web and mash it all together to form a new page. The pages usually contain every possible combination or phrase associated around a certain keyword. Basically their only purpose is to rank high enough in the search engines to get them a lot of traffic which translates into ad revenue for the owner of the content farm.
These content farmers have a complete disrespect for anyone else on the internet. Usually every image on their page is hot linked from another site. While it is possible to stop image hotlinking most people are unaware of the problem.
What’s even worse is that these content farmers will often hack another web server and set up shop without the victim even knowing it. Then they let their bots loose and create tons of pages in sub directories off of the main domain. Unless you regularly review the logs on your web server you probably won’t even know that your web server is hosting one of these farms.
Recently Google made a big change to their search algorithm in an effort to stop these sites from showing up search engine results. I think the Google update has helped but it hasn’t cured the problem yet, and their algorithm change doesn’t fix the root problem which is unsecured web servers.
How do you stop them?
Technically there is nothing wrong with creating a web site full of crap content as long as it doesn’t violate the terms of service of the hosting provider. The trick is to use the tactics of the content farmers to your own advantage in order to get them shut down by their web host.
Method #1 – Report them for hotlinking
Since they have a bad habit if hotlinking you can report them if they hotlink to an image on your site. If you monitor the logs of your web server it isn’t hard to find them. When someone hotlinks to your site their domain will be listed as the referrer.
Below is an excerpt from my logs, I’ve highlighted some of the content farms I found hotlinking to my content. The URL will usually contain several keywords and they won’t make any sense at all. I use AWStats almost every day, it’s a great web based log analyzer.
Method #2 – Report a hacked server
Almost all of the content farms I have discovered lately are on a legitimate web site that has been compromised. The easiest way to determine if your dealing with a hacked server is to visit the home page of the domain.
For an example lets look at one of the domains in my log, the referring URL is http://topsys.co.in/processed-stephen-a-kearns-montreal. A quick visit to topsys.co.in shows that the domain belongs to a software development company and it looks legitimate. Usually if something doesn’t look right it probably isn’t.
Another thing you can do is run a inurl search on google. I typed in inurl:http://topsys.co.in and quickly found the other sub directories created by the bot.
Sending in the report
If someone’s web server has been hacked you should do them (and the internet) a favor and submit a report to the web host. The hosting provider can take the necessary action of contacting the owner to resolve the problem or shutting down the site.
Almost all hosting companies have an email address setup for accepting abuse complaints. To find out where to send the reports you first need to find the IP address of the domain hosting the content. This can be done by simply pinging the domain name. Once you have the IP address visit arin.net to find out which company the IP address is registered to. On their website you can do a WHOIS search on the IP to find the owner.
All whois reports should list a point of contact, look for the contact listed for abuse complaints.
What do you say in the report?
I have an email template saved that I use when I need to send in a new report.
The key details you need to include are the IP address, domain in question, and any logs or evidence of the activity taking place. Since you already know the domain and IP that part is easy. Some providers will accept different items as evidence. You could simply send them the URL you discovered in your logs and they may find that sufficient. If you can find further sub directories using the google search trick I showed you that will help as well. If they are hotlinking to your site you might consider sending an excerpt from your log files that shows the actual request going through.
It’s not likely that content farms will be going away anytime soon unless more action is taken against them. Content farms waste a lot of bandwidth on the internet and end up decreasing traffic to legitimate sites. If more people start getting content farms shut down it will become more work then its worth for the owners.
One thought to “Shutting Down Auto-generated Content Farms”
I realize I’m about a year late to this post, but it’s a very recent issue for me and I wanted to thank you for the information – and the much needed help.
One image, on one page of my blog, was hotlinked by a group generating massive numbers of link farms all over the web. (Oddly enough, or maybe not, all of the affected sites were using Plone.) Thanks to the information you’ve shared here, I was able to contact the IP hosts for the 4 websites I confirmed were hotlinking my image.
I also changed the image (after renaming it and attaching the new file to my blog post) to big bold red lettering that said: Sites displaying this image are stealing bandwidth from and listed my domain name.
Two sites removed the folder where the link farms lived and one site is completely gone. The 4th, located in South Africa, is still filled with this debris. Not quite certain what to do about them.
Anyway, just wanted to thank you for all of the help you didn’t even know you were giving me. lol
Have a great weekend.