Posts Tagged ‘seo’

What makes a robot undesirable?

I know that we have talked about robots on past blogs but this time I would like to bring forth the topic of undesirable robots, crawlers and spiders. I shall be referring to them collectively as robots. Firstly and most importantly, not all robots are bad. There are malicious robots out there but I won’t be discussing them in this blog. Secondly, not all robots are good. Search engines use robots to crawl your websites, the data they collect is use for rankings and SERPS. Certain Search Engine Optimisation specialist tools use robots to analyse links, amongst other things.

Ultimately the question is, “what makes a robot undesirable?”

Although robots can be multipurpose, you can generally categorise them as: search engine, context analyser, archive maker and the questionable.

Search engine bots crawl through websites gathering information for use in SERPS and ranks. They are a good thing unless you don’t want anyone to find your site. Most search bots will honour your robots.txt file to regulate what it will and will not look at. Some robots are specifically crawling for images, if you feel it necessary, you can block them.

Context analysers are typically responsible for targeted adverts and banners. These don’t mean any harm and don’t go where they aren’t invited. There are only a handful of bots that fall into this category.

Archive-makers crawl around the web making copies of sites for the posterity of the Internet. There aren‘t very many of these robots and it is useful to have old versions of your website handy.

Some robots have questionable intentions that don’t necessarily have any benefits for your site. Link checker tools fall into this category, they are very useful but how frequently does it need to be used and who needs to use it on your site. There are tools that download websites for offline browsing but you may not want whole copies of your site being taken. Some automated tools can overwhelm your server and become in effect a denial-of-service attack.

The topic of undesirable robots is often overlooked since website, by design, are in the public domain. However, I believe that robots should benefit your site and its users. Any robots that may negative affect services or server performance are undesirable and need to be blocked through the robots.txt or more severely through .htaccess (or web.config).

Sitemaps – Index Many Content Types in One Sitemap

Sitemaps are an important Search Engine Optimisation tool and Google have recently announced an addition to their sitemap algorithm that allows you to add video and images, plus mobile URL’s and geo information. You can now create a single sitemap that contains all of this information so your visitors can be served the best possible content when they search for you. This is a great addition that will vastly help SEO, because it allows fresh media content to appear along with your search engine results as well as your new content.

For example if you have a page related to a specific product, and you also happen to have a video on the page reviewing this product, adding the video URL to your sitemap will allow Google to pick up the video and include it in Google Video and also on the main search results where related videos for your search term are often shown.

The move to add these new features was provoked after site owners have been challenging Google to add the features since 2005. Since that time, several formats of sitemaps were released, but to make it easier, Google have introduced the feature for you to consolidate all of these files into one. The structure of the sitemaps hasn’t changed vastly from what you are using right now. The additional capability to add URL’s and content types is about all that has changed with the recent additions.

Here is an example of a sitemap that contains references to an image, a video and a standard web search.

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
        xmlns="http://www.sitemaps.org/schemas/sitemap-image/1.1"
        xmlns="http://www.sitemaps.org/schemas/sitemap-video/1.1">
  <url>
   <loc>http://www.domain.com/example-page.html</loc>
    <image:image>
       <image:loc>http://www.domain.com/images/image.jpg</image:loc>
    </image:image>
    <video:video>
    <video:content_loc>http://www.domain.com/videos/videoABC.flv
</video:content_loc>
      <video:title>An example video title</video:title>
    </video>
  </url>
</urlset>

Even despite these changes, the maximum file size of a sitemap remains 10MB with a maximum of 50,000 URL’s in a single file. If you have more, you will have to split it into several files and submit them all. Remember, you can have several sitemaps and then create a sitemap index which you can submit. This file contains all of your larger sitemaps which Google can then index. This saves the effort of submitting them all by including links to them in a single file.