Block Bad Search Bots and Spiders Using mod_rewrite

Block Bad Search Bots and Spiders Using mod_rewrite

There are many search bots, spiders, and content scrapers out in the internet. Some are completely legitimate like Google and Bing; however, there are also many that are overly aggressive or are simply out to scrape content or email addresses from your web site. Analyzing my own web site access log files led me to discover that there are several extremely aggressive bots that are hitting my web site frequently all day, that seem to have no true origin or search engine use and they were causing me some concern.

Bad bots, or those bots you don’t want hitting your web site can be blocked and controlled with some simple .htaccess mod_rewrite rules.

The example rules below will check the the HTTP_USER_AGENT header, and determine whether or not to force the visitor to redirect to an invalid page or URL. Keep note of the flags used on each rewrite condition and on the actual rewrite rule. These will help to process the conditions and actual rewrite rules.

  • OR : Implies an “or” condition, subsequent conditions will not be evaulated when a match is find previously in the list.
  • NC : Implies a “No Case” match condition, the case of the HTTP_USER_AGENT header will be ignored when evaluating this condition.
  • F : Implies a “Forbidden” condition, the bot or user requesting this URL will receive a forbidden error if a RewriteCond is matched.
  • L : Implies a “Last” condition, all other rules underneath this rule will not be processed. It can be considered a break command.

To leverage these rules simply add the following IfModule block your .htaccess or httpd.conf file.

<IfModule mod_rewrite.c>
    RewriteEngine On 
    RewriteCond %{HTTP_USER_AGENT} ^AhrefsBot [OR] 
    RewriteCond %{HTTP_USER_AGENT} ^BlackWidow [OR] 
    RewriteCond %{HTTP_USER_AGENT} Indy\ Library [NC,OR] 
    RewriteRule ^.* - [F,L]

Author: daharveyjr

I’m a solution architect responsible for the design, development, implementation, testing, and maintenance of e-commerce operations and applications using the Hybris and WebSphere Commerce product suites and other web technologies such as Java, J2EE/JEE, Spring, PHP, WordPress and more. Twitter | Facebook | LinkedIn

Leave a Reply

Your email address will not be published.