Analytics

How to Fix the Dreaded Duplicate URL in Google Analytics

Solve for duplicate URLs with/without trailing slashes in your Google Analytics data.

How to Fix the Dreaded Duplicate URL in Google Analytics

Brad Redding

Brad Redding is the Founder & CEO of Elevar. Specializing in analytics, tracking, GTM, and conversion optimization.

How familiar are you with the trailing slash? Simply put, it’s the slash at the end of a URL. With it or without it, you still go to the same site. So what does it matter?

Well, for analytics, this one tiny slash matters greatly — and can skew your reports. When viewing your Site Content URL report in Google Analytics (GA), you can spot the places that have two versions of the same URL, one with and one without a trailing slash.

Take this example: this URL, http://www.citychiconline.com/new-in, and this URL, http://www.citychiconline.com/new-in/, both display the same page.

But in GA, they would report out as unique URLs. Like this:

Look familiar?

If you don’t have a trailing .html (or similar) at the end of a URL, many content and eCommerce platforms (like WordPress and Magento) allow you to reach the same page.

But this becomes a problem when Google Analytics records the URLs differently and pushes this page view to your GA dashboard.

The GA tag doesn’t care if the content is the same — it’s simply recording the pageview URL, which is different (i.e., www.mysite.com/product-name versus www.mysite.com/product-name/).

Let’s dig deeper into how this happens — and how you can fix it.

How do people get to both URL versions?

 This is because:

  1. Your affiliate partners may link to your site in a version you don’t want.
  2. Team members managing your on-site content/links may not know to use a trailing slash or not.
  3. The development team likely doesn’t know what version you prefer when building new features.
  4. Blogs or other inbound links don’t ask you what you prefer.

Aside from adding in server-side solutions (which can be a headache) to always force a trailing slash or not, trying to control which URL version users go to will be near impossible.

Why fixing duplicate URLs is important

  1. Clean data: Dirty data like this can warp strategic decisions if you’re trying to slice, dice, and finalize insights on performance campaigns.
  2. Save time: Analysts will require more time in providing accurate insights to you based on dirty data that requires manual intervention.

So how can you fix duplicate links?

If you can’t keep them from happening altogether, you can remove them from GA so that they don’t confuse or adversely affect your analytics data.

How to fix duplicate URLs in reporting

Step 1: Create a new GA view so you can test and validate if this fix works for you before you apply it to your day-to-day view(s)

Step 2: Create a new advanced filter with this regex (screenshot below), which performs this manipulation of your data:

If a URL does not have query parameters at the end of the URL AND is a standalone URL AND does not have a trailing slash at the end of the URL, then add a trailing slash at the end of the URL

Here is the regex for the advanced rule to test:

^(/[a-z0–9/_\-]*[^/])$

Once this has been completed, save your filter.

Step 3: Verify this new filter works as expected.

To test and verify this is working, follow these steps: you can use the Real Time report in Google Analytics to:

  1. In one browser tab go to a version of your site you’d like to test (e.g. yourdomain.com/new-in)
  2. In a 2nd additional browser tab go to the version of the same URL with the trailing slash (e.g. yourdomain.com/new-in/)
  3. Go to Google Analytics > Real Time Report > Top Active Pages and you should have 2 active pages from the previous steps!

One important note is that this filter will not update historical data, only new data going forward. 

Bonus: Fix Duplicate Content Issues

This post (to my surprise) ranks fairly well for people (maybe like you!) looking for answers on how to address duplicate content issues on their sites. A few live chat conversations later and I thought I should address this topic head on here 🙂 .

Since this blog is for eCommerce I’m going to ignore non-eComm examples…

The most common duplicate content issues originate from:

1. Products having multiple URL paths to access.

This is when you have .com/category/product-name AND .com/product-name accessible by users. If this is your issue then ensure you have a canonical link set that is the same on both of these pages.

To check for yourself, go to each product page => right click => view page source => control/command + F for the term “canonical”.

2. Products having variants or siblings that contain the same description and spec data.

This one can get a bit tricky depending on what eCommerce platform you’re on (ex. Shopify, Magento, BigCommerce, WooCommerce, etc).

  • If you have the type of product pages where you select a variant (e.g. “size” or “color”) and your URL changes where you have a query parameter at the end (i.e. .com/sweater-vest?size=4) AND none of your onpage content changes, then you might want to exclude “size” from being crawled in your Google Search Console settings. Prior to doing this I strongly suggest having a proper canonical link in place from the last step.
  • If you have the type of product pages where you have similar products that are broken out into their own individual pages for each variant AND they all use the same content then this is really tricky.
    • One option is to set the canonical link on all but one of the variant products to one single preferred parent product. I’ve done this on Magento (via uRapidflow & custom XML product setting override) and Shopify (GTM)  – so it is possible.
    • A second option is good old grunt work – improve the content (name, description) so it’s not duplicated.
    • A third, not recommended, option is to override your meta robots tag to follow,noindex.

3. Category pages sorting and filtering options not managed in Google Search Console.

I eluded to this in the previous step, but this will likely require some customization to your URL parameter settings in Google Search Console.

Basically this setting allows you to coach Googlebot how to crawl your site more efficiently.

For example: sorting by a-z or price low-high doesn’t really change the content on the page so we want to ensure these pages:

  1. Also have a canonical link set to the original category page URL
  2. Are not crawled/indexed….so the original merchandised page is most likely to be first when served in search results

Here’s where the setting is in Google Search Console and how to view example URLs.

url-params

Modifying this requires a full article on it’s own! If you have any questions please leave a comment below.

If you found this helpful – please subscribe to our email! You can expect lots of spam and unhelpful content 🙂 (kidding of course).

Would you like more helpful tips?

Sign up to my newsletter where I share more marketing and eCommerce insights.

We respect your inbox.

You may also like