Archive for the ‘Website Optimization’ Category

Canonicalisation and SEO

In an SEO campaign you want as much authority to go to your homepage as possible. You submit your site to directories, write articles, post on social networking websites and make regular blog posts that appear on twitter. All this to increase the number of backlinks to your website. And this is a good thing. The problem arrises when you have more than one way to view your homepage (or any page, for that matter).

URL Canonicalisation refers to two or more ways to view the same content, for example http://www.example.co.uk/, http://example.co.uk/, http://www.example.co.uk/index.html, and http://example.co.uk/index.html all load the homepage.

Clever search engines will be able to tell that these are all the same page and pass the authority all to the same place, but other search engines will follow all these links and pass authority to each of them seperately, the overall result being that you do not rank as high as you should. Worse, they could see these as seperate pages and penalise them for having duplicate content.

By removing the additional ‘copies’ of the homepage, the remaining page will get all the authority and your site will raise in the rankings.

There are a number of ways to fix a canonicalisation problem, but it depends on the server you are using. What works on an apache server may not work on a windows server. Below are the most commonly accepted ways of fixing a canonicalisation problem:

Apache:

On an apache server we will use a .htaccess file. If you have an .htaccess file, open it in your text editor of choice. If you do not have a .htaccess file, then you will need to create one. Most text editors will have a problem with saving the file as .htaccess as you have not specified a filename, only an extension. To get around this, open your text editor of choice and save the file as htaccess.txt. When you upload the file to your server, rename it to .htaccess. Now, in your .htaccess file, copy the following code (changing the domain name, obviously):

RewriteEngine on

RewriteCond %{HTTP_HOST} !^www.example.co.uk$
RewriteRule ^(.*)$ http://www.example.co.uk/$1 [R=301,L]

RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /.*index\.html\ HTTP/
RewriteRule ^index\.html$ http://www.example.co.uk/$1 [R=301,L]

What this is saying to the server is:

Turn on the URL rewriting module.

If the host is not www.example co.uk,
Redirect the browser to www.example.co.uk.

If the file being requested is www.example.co.uk/index.html,
Redirect the browser to www.example.co.uk/

IIS:

In your IIS Manager create a new site profile for the non www domain and select a permanent URL redirect to the www version.

Sounds easy, doesn’t it.

1. In Internet Services Manager, set up both www.example.com (with-www) and example.com (no-www) as websites.
2. Select the example.com website (no-www) in Internet Services Manager and go into the properties.
3. In the Home Directory tab, change the option button “When connecting to this resource the content should come from” to be “A redirection to a URL”.
4. In the “Redirect to” box, enter http://www.example.com$S$Q
(A note about the variables used here:
$S retains the requested URL’s full filepath
$Q retains any query string present in the request.)
5. Check the checkbox that says “A permanent redirection for this resource.” This is a key step, or else you will create a 302 redirect rather than a 301.

Other Methods:

The simplest method of pointing a search engine to the right page is the canonical meta tag:

<link href="www.example.co.uk/link-to-right-page.html" rel="canonical"/>

This is an example of what it would look like in xhtml. If you are using html then remove the trailing /.
By putting this code in the head tags of your page you will effectively be telling search engines ‘This page does not belong here. If you are reading this content, then please list it under this other url instead, as this is where it DOES belong.’

Summary

By redirecting pages to the same place you are building up the authority of that page. So instead of having “four” pages getting a trickle of authority each, you will have a single page getting a flood of authority.

Top 5 most common validation errors

W3C Validation errors are errors in the coding of a web page, they can range from missing alt tags, right through to unclosed, or incorrectly nested tags.

A Search Engine spider is basically a piece of software that visits a website, and expects to find it in a certain format and layout, so each time it finds one of these validation errors it has to decide what should be there. This all takes time, and the spiders only have a finite amount of time to crawl and index a page. If this time elapses before the page is completely indexed then this will cause problems in rankings.

Here are the five most common validation errors we encounter, and how to fix them.

No DocTypes

A doctype tells the browser what form of HTML you are using, for example HTML 4.0, XHTML or HTML 5. This impacts how the rest of the code is expected to be read, affecting the parsing ergo the way it is displayed in the browsers.

A list of doctypes is available from the World Wide Web Consortium (W3C) website (http://www.w3.org/QA/2002/04/valid-dtd-list.html). Place the correct doctype at the top of the webpage, above the opening html tag.

Closing tags

Tags that are either not closed or are mismatched cause a lot of problems. When using html tags, they should be closed in the order that they are opened, for exampled:

<div>some <b>bold</b> and <i>italic</i> text</div>
not
<div> some <b>bold and <i>italic</b></i> text</div>

When using xhtml, tags that don’t have a partner closing tag (for example the image tag) should be self closing, i.e. should have a forward-slash before the end of the tag, for example
<img src=”file.jpg” alt=”a file” />. Another example is the line break tag (<br> for html and <br/> for xhtml).

HTML in JavaScript

You would not believe how often this happens. Mostly it’s people trying to prevent spiders getting their email addresses in order to stop spam email. They will use something along the lines of:

<script type=”text/javascript”>
document.write ‘<a href=”mailto:”+”info”+”@”+”domain.com”>Email</a>’
</script>

The problem here is the closing anchor tag. Closing tags are recognised so need to be escaped, whereas comments and opening tags are not recognised. There are two approaches to fix this; either comment out of the javascript (by putting <!– after the first line, and –> before the last line), or put a leading back-slash before the forward-slash (</a>).

Missing Attributes

A common example of a missing attribute is “alt” in the img tag. An easy fix, just include alt=”", for example <img src=”image.jpg” alt=”"/>, but even this should be avoided. Wherever possible include a description in the alt tags, and wherever possible, include a keyword. This will help search engines determine what the image is about, and may include it under their respective images search, creating another possible source of traffic.

Flash

This is the most common problem with websites that use flash. The <embed> tag  was created by Netscape as their method of embedding plug ins and players in web pages. As it is not part of the XHTML specification it needs fixing. There are a couple of fixes – one is to use javascript to write the embed code, or use the fully valid object code:

<object data=”flashmoviename.swf” type=”application/x-shockwave-flash” width=”504″ height=”250″>
<param name=”MOVIE” value=”flashmoviename.swf” />
</object>

Any additional parameters can be included in using the param tag within the object tags.

We, at SEO Consult, understand that having a fully validated website is of huge benefit to your SEO campaign as it allows spiders to work out what your pages are about by being able to remove the content from the html tags easily, without having to guess at the correct html in order to make sense of the content.