How to keep the bots
off your website
We have been able to cut down about
80% of the bot traffic and all abusive bot traffic on the network.
This greatly improves security and website performance.
Bots can be looking for information, data
mining email addresses, copying pages or scanning for possible
security holes. By removing the bots from our network we do not
allow our websites to be exposed to the potential harm caused by
Bots can consume bandwidth at alarming
rates and crash servers by accessing more pages than they can
deliver at one time. Eliminating this unwanted traffic has become
a major concern for network administrators.
We have managed to create a solution which
automatically drops the bots from our network and shields our
hosting customers from the abuse they can inflict.
How does this help you as a hosting
Mostly, it will prevent the copying of your
pages and increase your rankings in the major search engines.
Since most of the bots are stealing your
pages and listing the text in their own pages the search engines
will only list one of you and with their power and experience
they often get listed for your content rather than you. By
preventing them from getting to your pages, we prevent the
automated theft of your pages. If you have ever seen parts of
your pages, articles or products posted on other websites, you
understand first hand, if not, consider yourself lucky.
It will help hide contact information from
spammers. Most spammers use automated bots to find email
addresses on websites to add to their spam lists. If we block
them before they find it, we can decrease some of the spam issues
you might have at email addresses you have posted on your contact
Most of all, you can be sure that bots wont
compromise performance of your website. By blocking out of
control robots we can prevent a website from becoming
inaccessible. One bot can easily access pages at over 1000 pages
per minute and leave visitors locked out or waiting for an open
connection. If several websites on the same server have bots
crawling them, the entire server could become inaccessible
normally known as a Denial Of Service Attack. By trimming the bots
at a firewall level they never get to the server to cause that
type of problem.
What Usually Happens
Most hosting companies will put the burden
on the hosted websites. In many cases, if bots crawl too many
pages to fast, the websites are closed for bandwidth overages or
removed for abuse of network or server resources. If you exceed a
certain page delivery rate your hosting can be terminated.
Obviously, its not your fault, but hosting companies don't care,
they want to preserve the network and one website out is easier
than dealing with the bots on a full network scale.
We are aware of this first hand because we
have purchased hosting on many competitor networks to compare
their services with our own. Some of the accounts were terminated
for that very reason, no questions asked and no chance to recover
our websites, we were treated like criminals because a bot
accesses our website at an excessive rate.
Our policy is different. We are the ones
with the technology and resources and expertise, so we have
created software to monitor connections and drop abusive bots
from the network. This is a unique and powerful tool in fighting
spammers and scammers and a big relief for our website owners.
How we do it
The process is actually very simple. When a
request is made for your website the user identifies itself, we
in turn ask it for ID. If the authentication matches one of the
trusted robots like google, yahoo or msn, we let it move on. Just
because they say they are googlebot does not mean they actually
are. The authentication process allows us to allow the trusted
bots and remove the fakers and page crawlers.
Most bots do not say they are bots but try
to pass as human traffic. We can identify those bots by using bot
traps and watching their activity. For example, most bots do not
download the images on a page, they are only looking for the html
or text portions. If we see a browser that is not requesting the
images on a page we can run an authentication with the isp
connecting it to the Internet. Most users are connected through
dynamic ips so when we can identify an ip as belonging to a
company that provides websites rather than website access we know
it is not a real person.
We also know that bots can access pages
much faster than a human being, so speeding through pages is a
sure sign of a bot. But more so, we see bots accessing more than
one website at a time on the network which is almost never done
by real people.
We use other tricks but we wont reveal them
all as not to show the crawlers exactly how we are detecting them.
But rest assured, we are detecting them and removing them one by
one before they can get what they are looking for.
Some of the technology was developed for
our affiliate marketing because bots would crawl affiliate links
and we would be penalized and kicked out of programs because
links generated too much traffic. Like most cases, the affiliate
networks don't care that it was a bot, they just pass the blame
and close your account. In order to stay in business we had to
filter the bots and now to stay in the hosting business we have
to use the same technology to filter the bots on a network level.
In the Internet if anyone can profit from
abusive behavior you will need to defend against it at some point.
Companies that close accounts or expect their customers to do
their job will fall quickly behind companies like pageBuzz.com.