Tech Blog

Canonical URL as a solution for duplicate content rel=canonical: the ultimate guide

The rel=canonical element, often called the “canonical link”, is an HTML element that helps webmasters prevent duplicate content issues. It does this by specifying the “canonical URL”, the “preferred” version of a web page. Using it well improves a site’s SEO.

The idea is simple: if you have several similar versions of the same content, you pick one “canonical” version and point the search engines at that. This solves the duplicate content problem where search engines don’t know which version of the content to show. This article takes you through the use cases and the anti-use cases.

History of rel=canonical

In February 2009 Google, Bing and Yahoo! introduced the canonical link element. Matt Cutt’s post is probably the easiest reading if you want to learn about its history. While the idea is simple, the specifics of how to use it turn out to be complex.

The SEO benefit of rel=canonical

Choosing a proper canonical URL for every set of similar URLs improves the SEO of your site. Because the search engine knows which version is canonical, it can count all the links towards all the different versions, as links to that single version. Basically, setting a canonical is similar to doing a 301 redirect, but without actually redirecting.

The process of canonicalization

When you have several choices for a products URL, canonicalization is the process of picking one. In many cases, it’ll be obvious: one URL will be better than others. In some cases, it might not be as obvious, but then it’s still rather easy: pick one! Not canonicalizing your URLs is always worse than canonicalizing your URLs.

canonical graphic 1024x630

How to set canonical URLs

Correct example of using rel=canonical

Let’s assume you have two versions of the same page. Exactly, 100% the same content. They differ in that they’re in separate sections of your site and because of that the background color and the active menu item differ. That’s it. Both versions have been linked from other sites, the content itself is clearly valuable. Which version should a search engine show? Nobody knows.

For example’s sake, these are their URLs:

  • http://example.com/wordpress/seo-plugin/
  • http://example.com/wordpress/plugins/seo/

This is what rel=canonical was invented for. Especially in a lot of e-commerce systems, this (unfortunately) happens fairly often. A product has several different URLs depending on how you got there. You would apply rel=canonical as follows:

  1. You pick one of your two pages as the canonical version. It should be the version you think is the most important one. If you don’t care, pick the one with the most links or visitors. If all of that’s equal: flip a coin. You need to choose.
  2. Add a rel=canonical link from the non-canonical page to the canonical one. So if we picked the shortest URL as our canonical URL, the other URL would link to the shortest URL like so in the <head> section of the page:
    <link rel="canonical" href="http://example.com/wordpress/seo-plugin/">

    That’s it. Nothing more, nothing less.

What this does is “merge” the two pages into one from a search engine’s perspective. It’s basically a “soft redirect”, without redirecting the user. Links to both URLs now count for the single canonical version of the URL.

Setting the canonical in Yoast SEO

If you use Yoast SEO, you can change the canonical of several page types using the plugin. You only need to do this if you want to change the canonical to something different than the current page’s URL. Yoast SEO already renders the correct canonical URL for almost any page type in a WordPress install.

For posts, pages and custom post types, you can edit the canonical in the advanced tab of the Yoast SEO metabox:

canonical-in-yoast-seo

For categories, tags and other taxonomy terms, you can change them in the Yoast SEO metabox too, in the same spot. If you have other advanced use cases, you can always use the wpseo_canonical filter to change the Yoast SEO output.

When should you use canonical URLs?

301 redirect or canonical?

If you have the choice of doing a 301 redirect or setting a canonical, what should you do? The answer is simple: if there are no technical reasons not to do a redirect, you should always do a redirect. If you cannot redirect because that would break the user experience or be otherwise problematic: set a canonical URL.

Should a page have a self-referencing canonical URL?

In the example above, we make the non-canonical page link to the canonical version. But should a page set a rel canonical for itself? This is a highly debated topic amongst SEOs. At Yoast we have a strong preference for having a canonical link element on every page and Google has confirmed that’s best. The reason is that most CMSes will allow URL parameters without changing the content. So all of these URLs would show the same content:

  • http://example.com/wordpress/seo-plugin/
  • http://example.com/wordpress/seo-plugin/?isnt=it-awesome
  • http://example.com/wordpress/seo-plugin/?cmpgn=twitter
  • http://example.com/wordpress/seo-plugin/?cmpgn=facebook

The issue: if you don’t have a self-referencing canonical on the page that points to the cleanest version of the URL, you risk being hit by this stuff. Even if you don’t do it yourself, someone else could do this to you and cause a duplicate content issue. So adding a self-referencing canonical to URLs across your site is a good “defensive” SEO move. Luckily for you, our Yoast SEO plugin does this for you.

Cross-domain canonical URLs

You might have the same piece of content on several domains. For instance, SearchEngineJournal regularly republishes articles from Yoast.com (with explicit permission). Look at every one of those articles and you’ll see a rel=canonical link point right back at our original article. This means all the links pointing at their version of the article count towards the ranking of our canonical version. They get to use our content to please their audience, we get a clear benefit from it too. Everybody wins.

Faulty canonical URLs: common issues

There is a multitude of cases out there showing that a wrong rel=canonical implementation can lead to huge issues. I know of several sites that had the canonical on their homepage point to an article, and completely lost their home page from the search results. There are more things you shouldn’t do with rel=canonical. Let me list the most important ones:

  • Don’t canonicalize a paginated archive to page 1. The rel=canonical on page 2 should point to page 2. If you point it to page 1 search engines will actually not index the links on those deeper archive pages…
  • Make them 100% specific. For various reasons, many sites use protocol relative links, meaning they leave the http / https bit from their URLs. Don’t do this for your canonicals. You have a preference. Show it.
  • Base your canonical on the request URL. If you use variables like the domain or request URI used to access the current page while generating your canonical, you’re doing it wrong. Your content should be aware of its own URLs. Otherwise, you could still have the same piece of content on for instance example.com and www.example.com and have them both canonicalize to themselves.
  • Multiple rel=canonical links on a page causing havoc. Sometimes a developer of a plugin or extensions thinks that he’s God’s greatest gift to mankind and he knows best how to add a canonical to the page. Sometimes, that developer is right. But since you can’t all be me, they’re inevitably wrong too sometimes. When we encounter this in WordPress plugins we try to reach out to the developer doing it and teach them not to, but it happens. And when it does, the results are wholly unpredictable.

rel=canonical and social networks

Facebook and Twitter honor rel=canonical too. This might lead to weird situations. If you share a URL on Facebook that has a canonical pointing elsewhere, Facebook will share the details from the canonical URL. In fact, if you add a like button on a page that has a canonical pointing elsewhere, it will show the like count for the canonical URL, not for the current URL. Twitter works in the same way.

Advanced uses of rel=canonical

Canonical link HTTP header

Google also supports a canonical link HTTP header. The header looks like this:

Link: <http://www.example.com/white-paper.pdf>; 
  rel="canonical"

Canonical link HTTP headers can be very useful when canonicalizing files like PDFs, so it’s good to know that the option exists.

Using rel=canonical on not so similar pages

While I won’t recommend this, you can definitely use rel=canonical very aggressively. Google honors it to an almost ridiculous extent, where you can canonicalize a very different piece of content to another piece of content. If Google catches you doing this, it will stop trusting your site’s canonicals and thus cause you more harm…

Using rel=canonical in combination with hreflang

In our ultimate guide on hreflang, we talk about canonical. It’s very important that when you use hreflang, each language’s canonical points to itself. Make sure that you understand how to use canonical well when you’re implementing hreflang as otherwise you might kill your entire hreflang implementation.

Conclusion: rel=canonical is a power tool

Rel=canonical is a powerful tool in an SEO’s toolbox, but like any power tool, you should use it wisely as it’s easy to cut yourself. For larger sites, the process of canonicalization can be very important and lead to major SEO improvements.

No comments
RegulusCanonical URL as a solution for duplicate content rel=canonical: the ultimate guide

Leave a Reply

Your email address will not be published. Required fields are marked *