Previously, we explained why duplicate content is bad for SEO and how most of it on your own site are created unintentionally. Having duplicate content may create problems for your SEO but you can mitigate the impact through certain measures. Today’s article will provide you with 4 alternatives to solve the problem, but first, here’s how to find duplicate content.
How to find duplicate content
Use Google Search
Search for keywords that your site is ranking for and look at the pages ranked. If you notice Google ranking your content with varied URL structures, you have duplicate content (and/or you lack consistency when structuring URLs). If needed, you can also find duplicates of your content on your site by doing a simple site: query search on Google with 2-3 sentences of your content.
Another way to discover duplicate content is to use tools such as Screaming Frog. It gathers information such as URLs, page titles, meta-descriptions and headings of your pages. From there, the tool allows you to filter out duplicates based on the category selected (i.e URLS, page titles…). The good thing is, Screaming Frog crawls up to 500 pages for free, which is sufficient for most sites. (Yay!)
After finding the duplicate content on your site, it’s time to put that information to good use!
4 Ways to solve the problem of duplicate content
To make it easier to understand, in the upcoming examples, Page A is the original while Page B is the duplicate.
Method 1: Use 301 Redirects
Redirect Page B to Page A using 301 Redirects. By doing so, when visitors click on Page B, they are immediately redirected to Page A. In addition, it tells Google that Page B has been moved permanently to (or in this case, replaced by) Page A.
Through 301 redirects, most of the link equity of Page B is transferred to Page A. This is crucial as link equity is one of the signals that Google uses to rank pages. With the link equity directed and focused on just Page A in the future, it will be able to rank better.
Method 2: Use canonicalization (rel = “canonical”)
Rel = “canonical” is an attribute that can be added to the <head> element of Page B’s HTML (Figure 1). It sounds complicated but what it simply does is tell Google that Page B is a copy of Page A. Unlike 301 redirects, visitors who click on Page B will still be directed to Page B. This means both Page A and B remain accessible. However, when Google crawls Page B, it will not include it in its database. Additionally, all link equity and metrics (such as web traffic) will be attributed to Page A.
Note: Method 1 and Method 2 achieve similar results but they are individually suited for different types of duplicate content. We suggest using method 1 when the duplicates can be entirely replaced by the original (e.g. homepage). On the other hand, use method 2 if the duplicates are still needed (e.g. product description pages).
Method 3: Use Meta Robots tag
Same as the rel = “canonical” attribute, the Meta Robots tags can be added to the <head> element. Think of it as instructions for the Google bots that are used to crawl (scan) pages. When the tag is used with the value “noindex, nofollow” (Figure 2), the Google bots know to crawl Page B but exclude it from Google’s index (database). This prevents duplicate content from showing up on SERPs.
Method 4: Set preferred URL version
Tell Google your preferred URL version through Google Search Console. This is especially useful if you have www and non-www versions of pages (which Google sees as 2 separate pages). By indicating your preference, Google will only crawl and include the preferred version in its database. This eliminates any possibility of having duplicate content.
Most importantly: Be consistent
When it comes to URL structure, be consistent when you start building your site. By being consistent, you can minimize the need to employ the above methods.
Leave it to Google rather than this method
Well, if all else fails, you can leave it to Google to choose the best version of the content to show in its SERPs. It is better to do so than to block the duplicates from Google’s access by using robots.txt or other methods.
“We now recommend not blocking access to duplicate content on your website, whether with a robots.txt file or other methods.”
If Google cannot access duplicate versions of pages, they cannot determine that it is duplicate content. Instead, Google will have no choice but to treat them as unique pages. This will hinder Google in consolidating the ranking signals and attributing it to the best version of the content, affecting search rankings.
Looking at the solutions and wondering which is most appropriate to use for your situation? At Appiloque, we strive to help you improve your digital presence. Why give yourself unnecessary trouble when you can simply reach out to us?
Zachary is a Business Development Executive in Appiloque. In the after-hours, he serves as a Division Agent, taking back the city of New York when all else fails.