Domain Canonicalization – Part 1

October 10, 2008

Canonicalization is a fancy word to describe the process of choosing the best URL to display for a given page when there are several choices. The most common scenario is www vs. non-www URLs. For example, most people would consider the following URLs to be the same:

The problem is that even though each of these example URLs could all point to the exact same page, they are still different URLs, and are treated as such by search engines. The reasoning for this stems from the fact that technically, a web server could return completely different content for all the above URLs.

So why does it matter?
There are many negative effects of serving both the www and non-www versions of your site. It can result in duplicate content (having the exact same content at more than one URL) and can also lead to “dirty” data in your Google Analytics. First, we’ll discuss the problems from an SEO perspective – serving duplicate content and splitting your links to different URLs. Our next post will dissect the problem from a Google Analytics perspective.

Canonicalization from an SEO perspective
Having the same content at “different” URLs presents duplicate content. Now, let me dispell a common myth: there is no such thing as a duplicate content penalty by Google or the other major search engines. In other words, if you have duplicate content on more than one page, the search engines will not actively lower your rankings.

The problem with duplicate content is that it splits up the links you have pointing to your pages. We all know (hopefully) that links to your website from other websites (inbound links) play a vital role in increasing your rankings in the search engines. When you have one page with two or more URLs, there is a chance that not everyone will link to the same URL.

For example, if you have 1,000 inbound links to www.yoursite.com/ and 1,000 inbound links to yoursite.com/ (without the www), then the search engines will only count half of the total links to your homepage. If the search engines saw that all 2,000 links were in fact pointing to the same page, however, then your homepage would certainly rank higher than if the search engines only counted 1,000. So that begs the question: “How do I consolidate those links to one canonical URL?”

URL Rewriting Tools
The easiest way to fix this very common problem is by using mod_rewrite and adding URL rewriting rules to your .htaccess file. This requires being on an Apache server, however, which some of us are not fortunate enough to be on. If your website is on a Microsoft Internet Information Server (IIS), then you can accomplish the same task by installing ISAPI_Rewrite. The basic idea is that when someone requests http://yourpage.com/, your server would do a 301 (permanent) redirect to http://www.yourpage.com/.

Stay tuned for Part 2, which will discuss domain canonicalization and how it affects your Google Analytics.