Search Engine Optimisation (SEO) Specialists

Search Engine Spiders

The Internet is big. In fact the Internet is very big. Recently Google announced that their systems that process links on the web to find new content had hit a milestone. They had recorded 1 trillion (as in one thousand billion) unique URLs on the web at once. That's quite some going considering that the first Google index in 1998 had only 26 million pages and by 2000 the Google index had reached only one billion.

Clearly that's a lot of ground to cover. The way that search engines collect that information is by using search engine spiders.

All search engines, including Google, use spidering as a means of providing up-to-date data. Spiders, otherwise known as bots or web crawlers, are software programs that request pages much like regular browsers do. These pages are indexed to provide fast searches.

The usual starting points are lists of heavily used servers and very popular pages called the seeds. The spider will begin with a popular site, collecting information and then indexing the words on its pages and following every link found within the site. In this way, the spidering system quickly begins to travel, spreading out across the most widely used portions of the Web.

There are different types of spiders and different types of spider behaviour. For example, submission checkup spiders will do a simple check to make sure a submitted URL is valid, if the server is available/accessible, if a redirect command is affected, etc. If the URL passes this test, it will typically be stored in a task queue for later crawling.

When it's time for the crawl, the spider will first check the page's head section - title tags, keyword phrases, description tags, meta tags and robot instructions. It will continue to read the page content - headings, alt tags, link titles, keywords and phrases. It then locates the links on the page and follows them repeating the process on destination pages. Alternatively, as the crawler visits these URLs it identifies all the hyperlinks in the subsequent pages and adds them to the list of URLs to visit, called the crawl frontier. URLs from the frontier are recursively visited according to a set of policies. Continually crawling the Internet, each spider can keep hundreds of web connections open at one time, crawling thousands of pages a second and generating huge amounts of data that's encoded to save space. It's an endless task, with websites being continually edited, added, deleted and dynamic web pages altered.

The search engines use this information to identify the most relevant web sites, prioritising them based on factors such as link popularity and the quality of its content. Without the crawling, indexing and coding carried out by the spiders the search engines would take a lot longer to retrieve their results.

As well as being employed to collect information to assist in relevant searching, crawlers are also used for automating maintenance tasks, such as checking links or validating HTML code. Crawlers are also used to gather specific types of information from Web pages, for example harvesting e-mail addresses (usually for spam).

Clearly spiders have an essential role in an organization's web presence. Understanding how they work, what they like and don't like plays an important part in applying any search engine optimisation strategy. At SEO Consult we have enormous experience in spider behaviour and their requirements.

It's important to make life as easy as possible for the spiders. Here are some aspects of spider related SEO that we at SEO Consult consider when applying a campaign. Spiders love fresh content. The more frequently a site is updated with new content, the more frequently spiders explore your site.

Each addition or change of content sets off a chain reaction that results in a visit from Google Bot or Slurp for Yahoo! or others. The spiders compare recent changes with the last cached snapshot of your pages, noting revisions and integrating them.

A neglected page or a page with old content on your site that could benefit from a content and link audit suggests to the search engines that on-page factors are not a top priority. It says that the page has become irrelevant.

Regular fresh content is all part of a healthy site synergy. A strategy that ideally promotes the creation of islands of related information in the form of subject pages that in themselves act as compelling visitor destinations. Each should be valid enough to achieve its own weight and rank as well as to contribute to the overall site authority. Traffic and exposure increases and improves as a consequence. Ideally include some juicy outbound links (as well as internal links) to contextually relevant sites of authority.

It's important that you don't make the spiders work too hard to find the really useful information on your site. Spiders have a very limited attention span, only really interested in on average 16% of each site they visit. Make the relevant information accessible by not burying it too deep in the site. Make your entire mission-critical information accessible by applying a flat site architecture, allowing spiders (and humans) to navigate the site with ease.

Add a site map where the spiders can find every single link. The site map should contain text links, not graphics, and should also contain some text relevant to the site. Apart from crafting poor titles with irrelevant naming conventions, the next thing that you could do to cripple your optimisation efforts is to not have a site map clearly linked from every page in your site.

Keep the load time down. With billions of pages to index, the faster the page-load the greater the chances of that page being picked up and indexed.

Databases and dynamic pages are extremely hard work for spiders - in fact they'll rarely bother with them. They like light HTML pages with links and keywords that can be easily sifted through.

Dense amounts HTML code and graphic placeholders act as obstructions. With clean, well-written code spiders can easily access your important information.

Contact SEO Consult now for more information on how we can integrate our spider-friendly strategies into your search engine optimisation campaign.

Quick Enquiry Form

* Required Field
NAME:    *
TELEPHONE:    *
EMAIL:    *
WEB URL:    *
Chat Button
Verify?
Rankings For Best UK SEO Companies - Dec 08 - Sep 10
Follow SEO Consult on Twitter Follow SEO Consult on del.icio.us Follow SEO Consult using RSS Follow SEO Consult on Digg! Follow SEO Consult on Facebook

Latest SEO Blogs - Click Here

Two Reasons Why You Should Be Monitoring Forums Relevant To Your Business

posted on: 2010/09/03 in Search Engine Optimisation

A huge number of forums on a wide and varied range of topics now exist. These forums offer a great level of support and assistance for many but also provide business owners with a perfect opportunity too. If you are running an online business, you need to identity forum...

Bookmark and Share

What Do You Think Of Your Competitors

posted on: 2010/09/02 in Search Engine Optimisation

When running a search engine optimisation campaign, it is easy to think all of your time and effort has to be focused on your own business. After all, SEO is about getting your website ranked highly in the search results, developing your brand within the online communit...

Bookmark and Share

Three Tasks Your SEO Consultant Will Do For You Each Day

posted on: 2010/09/01 in Search Engine Optimisation

Professional SEO consultants can provide you with a vast amount of information and action. There are many important SEO facts they know and many SEO techniques which they can use effectively for your business. SEO can be complicated and so knowledge and experience like ...

Bookmark and Share

Latest SEO News - Click Here

Latest SEO Press Releases - Click Here

Latest SEO Articles - Click Here

SEO Discussion Board from SEO Consult SEO Consult Resource Centre for Search Engine Optimisation
W3C XHTML Valid W3C CSS Valid W3C WAI  Internet Marketing RSS Feed for SEO Consult