Google considers duplicate content to be “substantive” blocks of text within a single domain or across multiple domains that are either completely identical or are “appreciably similar”.
Duplicate content is generally a bad thing as some use it as an attempt to influence search engines into giving stronger rankings to pages. This is bad for the web as a whole and creates a poor experience when the user sees repeated content in results.
The first thing you might want to do is head over to Copyscape and run a search on your site. Copyscape will tell you (reasonably reliably) if there is content on your site that is duplicated across multiple pages or can be found on other sites across the internet. You only need to put in your homepage URL, rather than the full address of individual pages – Copyscape automatically checks sub-pages for you.
There are some legitimate uses of duplicated content, however, like offering a near-identical print version of a page or a simpler mobile version with different formatting to the main page. Both of these are perfectly acceptable to users and therefore to Google. You do, however, need to tell search engines which is the “real” content, and which is the auxiliary.
So what’s an SEO to do? Examine the possible solutions below:
Robots.txt
You can use a robots.txt file to exclude a duplicated version from showing up in results pages. If this is what you want then use:
User-Agent: *
Disallow: /sameasanotherpage_printversion.html
The above will make sure the print version is not indexed at all. What you may want is for it to be indexed in search results, but avoid penalties from the search engines for doubling up on content. In which case use a canonical tag…
Canonical tag
You can use a canonical tag in your site’s <head> to tell search engines which is the preferred URL of a particular page. You would add the following tag to the duplicated version:
<link rel="canonical" href="http://www.example.com/the-proper-version" />
The above will indicate to search engines where the proper version of this replicated content can be found, meaning there’s no confusion and no ranking penalties for looking sneaky. The bonus of this method is that both the main version and the duplicate version can also appear in search results – it’s not an either/or situation.
Use 301s after a site restructure
If you’ve built a new site or just restructured some of your URLs there’s a chance you’ll have the same content available at different URLs. If you find you have then you’ll need to redirect the old version to the new one. See our post on redirects here.
A small snippet is best
If you want to use some teaser text on your pages to entice users to follow links to your blog or news section then that’s fine as long as you keep it relatively short. At SEO Consult we tend to use the title plus an additional 100 characters or so for this. This works out to be around 20 words or so and is usually enough to get an idea of what the article is about.
You can discuss this at the SEO Forum
Related posts:
- Google and Duplicate Content
Everyone is aware that duplicated content has absolutely no value whatsoever, and that Google is highly likely to discredit any site that it finds it... - Avoid messing up your robots.txt
Robots.txt isn’t something that’s often talked about in articles about SEO, but it is an important consideration. Most sites will have some files that they... - URL Canonicalization and SEO
So what is URL Canonicalization and why is it important to SEO ? Well if you have been following my blogs Duplicated Content is an... - Duplicated Content an SEO Nightmare
This week I have been doing Search Engine Optimisation on an number of sites where duplicate content has been present from all around the web.... - Search Engine Optimization Sins – Duplicate Content
It’s not uncommon to see duplicate content on the web, but it is becoming less and less common as search engines use new and improved...
Tags: canonical tag, duplicate content, robots.txt, seo copy
Link to us
If you want to link to this blog, copy and paste the following HTML code to your website.










Great post ! A lot of people on the Web do duplicate content, I think about all those who copy contents from blog posts and sites.
So, the real question is how to defend ourself when those persons copy your content.