I know that we have talked about robots on past blogs but this time I would like to bring forth the topic of undesirable robots, crawlers and spiders. I shall be referring to them collectively as robots. Firstly and most importantly, not all robots are bad. There are malicious robots out there but I won’t be discussing them in this blog. Secondly, not all robots are good. Search engines use robots to crawl your websites, the data they collect is use for rankings and SERPS. Certain Search Engine Optimisation specialist tools use robots to analyse links, amongst other things.
Ultimately the question is, “what makes a robot undesirable?”
Although robots can be multipurpose, you can generally categorise them as: search engine, context analyser, archive maker and the questionable.
Search engine bots crawl through websites gathering information for use in SERPS and ranks. They are a good thing unless you don’t want anyone to find your site. Most search bots will honour your robots.txt file to regulate what it will and will not look at. Some robots are specifically crawling for images, if you feel it necessary, you can block them.
Context analysers are typically responsible for targeted adverts and banners. These don’t mean any harm and don’t go where they aren’t invited. There are only a handful of bots that fall into this category.
Archive-makers crawl around the web making copies of sites for the posterity of the Internet. There aren‘t very many of these robots and it is useful to have old versions of your website handy.
Some robots have questionable intentions that don’t necessarily have any benefits for your site. Link checker tools fall into this category, they are very useful but how frequently does it need to be used and who needs to use it on your site. There are tools that download websites for offline browsing but you may not want whole copies of your site being taken. Some automated tools can overwhelm your server and become in effect a denial-of-service attack.
The topic of undesirable robots is often overlooked since website, by design, are in the public domain. However, I believe that robots should benefit your site and its users. Any robots that may negative affect services or server performance are undesirable and need to be blocked through the robots.txt or more severely through .htaccess (or web.config).







