How to keep the bots off your website

We have been able to cut down about 80% of the bot traffic and all abusive bot traffic on the network. This greatly improves security and website performance.

Bots can be looking for information, data mining email addresses, copying pages or scanning for possible security holes. By removing the bots from our network we do not allow our websites to be exposed to the potential harm caused by these bots.

Bots can consume bandwidth at alarming rates and crash servers by accessing more pages than they can deliver at one time. Eliminating this unwanted traffic has become a major concern for network administrators.

We have managed to create a solution which automatically drops the bots from our network and shields our hosting customers from the abuse they can inflict.

How does this help you as a hosting customer?

Mostly, it will prevent the copying of your pages and increase your rankings in the major search engines.

Since most of the bots are stealing your pages and listing the text in their own pages the search engines will only list one of you and with their power and experience they often get listed for your content rather than you. By preventing them from getting to your pages, we prevent the automated theft of your pages. If you have ever seen parts of your pages, articles or products posted on other websites, you understand first hand, if not, consider yourself lucky.

It will help hide contact information from spammers. Most spammers use automated bots to find email addresses on websites to add to their spam lists. If we block them before they find it, we can decrease some of the spam issues you might have at email addresses you have posted on your contact pages.

Most of all, you can be sure that bots wont compromise performance of your website. By blocking out of control robots we can prevent a website from becoming inaccessible. One bot can easily access pages at over 1000 pages per minute and leave visitors locked out or waiting for an open connection. If several websites on the same server have bots crawling them, the entire server could become inaccessible normally known as a Denial Of Service Attack. By trimming the bots at a firewall level they never get to the server to cause that type of problem.

What Usually Happens

Most hosting companies will put the burden on the hosted websites. In many cases, if bots crawl too many pages to fast, the websites are closed for bandwidth overages or removed for abuse of network or server resources. If you exceed a certain page delivery rate your hosting can be terminated. Obviously, its not your fault, but hosting companies don't care, they want to preserve the network and one website out is easier than dealing with the bots on a full network scale.

We are aware of this first hand because we have purchased hosting on many competitor networks to compare their services with our own. Some of the accounts were terminated for that very reason, no questions asked and no chance to recover our websites, we were treated like criminals because a bot accesses our website at an excessive rate.

Our policy is different. We are the ones with the technology and resources and expertise, so we have created software to monitor connections and drop abusive bots from the network. This is a unique and powerful tool in fighting spammers and scammers and a big relief for our website owners.

How we do it

The process is actually very simple. When a request is made for your website the user identifies itself, we in turn ask it for ID. If the authentication matches one of the trusted robots like google, yahoo or msn, we let it move on. Just because they say they are googlebot does not mean they actually are. The authentication process allows us to allow the trusted bots and remove the fakers and page crawlers.

Most bots do not say they are bots but try to pass as human traffic. We can identify those bots by using bot traps and watching their activity. For example, most bots do not download the images on a page, they are only looking for the html or text portions. If we see a browser that is not requesting the images on a page we can run an authentication with the isp connecting it to the Internet. Most users are connected through dynamic ips so when we can identify an ip as belonging to a company that provides websites rather than website access we know it is not a real person.

We also know that bots can access pages much faster than a human being, so speeding through pages is a sure sign of a bot. But more so, we see bots accessing more than one website at a time on the network which is almost never done by real people.

We use other tricks but we wont reveal them all as not to show the crawlers exactly how we are detecting them. But rest assured, we are detecting them and removing them one by one before they can get what they are looking for.

Some of the technology was developed for our affiliate marketing because bots would crawl affiliate links and we would be penalized and kicked out of programs because links generated too much traffic. Like most cases, the affiliate networks don't care that it was a bot, they just pass the blame and close your account. In order to stay in business we had to filter the bots and now to stay in the hosting business we have to use the same technology to filter the bots on a network level.

In the Internet if anyone can profit from abusive behavior you will need to defend against it at some point. Companies that close accounts or expect their customers to do their job will fall quickly behind companies like pageBuzz.com.

 

 

©1997 - 2014 Bumblebee Works & The Cyber Web Inc
pageBuzz.com is a subdivision of BumbleBee Works
Web Hosting
pageBuzz® and pageBuzz.com® are registered trademarks of The Cyber Web Inc