To have a usable and properly optimized website and achieve greater visibility in search engines, it is important to work on web usability actions, as well as technical SEO. In the latter, one of the parameters that we must take into account is to avoid duplicate content. Having repeated or similar content on a website not only harms the user experience, but can also influence – negatively – the seo positioning of the website in Google results. It is therefore important to know how to detect when there is duplicate content, how it can affect the website and, especially, to prevent creating it.
What is duplicate content and when can we find it?
As Google points out, duplicate content are blocks of content of a considerable size that are very similar to each other or coincide completely, whether they are on the same website or on another. Who is going to be interested in reading the same content on different pages? Or more precisely, are the robots going to be interested in reading the same content on a website more than once and wasting time?Google’s goal is for a website to be unique and original, so copying identically a block of content from someone else or trying to trick the engines with the same content, can have negative consequences on SEO rankings, as they themselves indicate.
And it will also affect when it is not done intentionally. We may have two very similar pages and Google ends up indexing the one we are less interested in – and in some cases we don’t even consider it.Having duplicate content on a website can affect a page’s SEO results in the long run.
When is there duplicate content on a page?
We can find duplicate content on the same page when it has exactly the same text and images, but there are also other parameters that are identified as duplicate content. This would be the case when we have websites in different subdomains in different countries (and the same content on some pages) and do not have the content created in a customized way for each continent, as well as URLs with different parameters (with similar or identical content).
Home page URL canonicalization
A page that is often affected by duplicate content problems is the home page. Surely, sometimes you have encountered this type of URLs or similar:
- www.web.com
- www.web.com/
- www.web.com/index.htm
- …
In this case, the solution will be to identify one of the main ones and redirect the others to this page or to include a canonical tags (we will see it below).
How to detect duplicate content?
Knowing if there is duplicate content on our website, either internally or with other websites, will be very important. Thus, there are several actions that we can use to detect it, as well as several online tools that will facilitate the task and help us to optimize time.
SEMrush
With SEMrush we will be able to detect in an easy and simple way if there is duplicate content on a website. Through the seo audit of the website we will know in the section of errors or warnings if there is duplicate content.
Screaming Frog
With Screaming Frog‘s new features, it will be easier than ever to identify duplicate website content. To do this, once your website has been crawled, just go to settings > content > duplicate. And then define the percentage of similarity you want the tool to identify.
Siteliner
Enter the URL you want to track and Siteliner will automatically identify duplicate content within your website, as well as externally, in a very concrete and precise way (exact word names, percentage of similarity…).
There are many other options to find duplicate content on a website, which one is your favorite? Write me in the comments :)!
How to avoid creating pages with duplicate content?
If you have pages with similar or identical content on your website, there are several options you can do to avoid duplicate content:
Create unique and original content
The formula for success to avoid duplicate content on a page is the simplest of all: create unique and original content. The more different we are and the more personalized what we offer, the more difficult it will be for our content to be similar to that of other pages.
Also, in some cases it is very difficult to create very different content, especially in the case of online stores, where we offer products with very similar characteristics. Later on, we will detail other strategies that we can employ in this type of cases.
URLs, titles, meta-descriptions and headings
Another aspect that we must take into account to avoid creating duplicate content is the metadata, especially in e-commerce (although it is also very important in other types of websites). We have to customize each SEO title and meta-description, as well as the headings of the pages (headings)! Otherwise, especially if we have very similar products, it is possible that these are created automatically and repeat each other (SEO title and main heading, especially). It will be work, but in the long run it will benefit the visibility of the products in the SERPs.
301 redirects to avoid duplicate content
A 301 redirect indicates that the content has been permanently redirected to another URL, so it is removed from Google’s results, since we are transferring authority to another page.
This action, which can also be used for other SEO strategies, is also used to avoid duplicate content. It is usually common in cases where we find domains with HTTP and HTTPS (we will redirect each page of the first, to the second) or in cases where we have different URLs for the same page (home) as explained above.
It is very important to take into account that when we make 301 redirects have logic. That is to say, that we redirect them to pages with similar content in order not to damage the user’s experience on the web. There are several ways to make redirects. They can be done from the .htaccess file or with WordPress plugins.
Canonical tag, the key to differentiate pages with similar content
If you manage the SEO strategy of an e-commerce, you probably already know what the canonical tag (“rel=canonical”) is all about. On websites with many products it is very common to use the canonical tag to avoid duplicate content, as well as on websites that have more than one main URL.
If we do not identify which is the main page of a set of similar URLs, Google’s robots will do it – and, moreover, they will waste time doing it. And sometimes it may not be the one we are most interested in. Thus, it will be very important to indicate to the robots which is the canonical URL, that is, the most significant page of a set of very similar or duplicated URLs.
In this case, as highlighted by Google Webmasters, including a canonical tag is not a “directive” to the search engines that will benefit the page rankings in the SERPs directly, but a signal to the robots that will help them interpret the most important content. So yes, in the long run, it will probably benefit the visibility of the website.
URL parameters
In different types of websites, especially in e-commerce, there are many parameters that can appear in URLs due to small variations, especially when we apply filters (prices, sizes, colors…).
For example, if we sell a coat of a single color, but with several sizes, and each of them generates a different URL, this will create duplicate content, since everything included in each URL will be very similar.
In this case, the solution is to add the canonical tag on the home page, so you indicate to Google which page you want to be shown in the SERPs. Another possibility would also be to indicate it in Google Search Console > URL Parameters.
What do I do if I already have pages with duplicate content on my website?
For the future, we already know several actions that we can develop to prevent duplicate content. Although, if you already have the problem today, what should be done is to solve it!
Thus, the techniques that can be used if you want to remove content from your website, are more or less the same as we have explained above (although it should be adapted to the needs and objectives of each website).
If on the one hand, for example, you have duplicate products due to URL parameters, the recommendation would be that we select the main one and the others have canonical referenced to this one, so Google will take more into account this page.
Or if, on the other hand, you have more than one URL on the main page, the recommendation would be to choose one as the main one, include the self-referencing canonical tag, and redirect the others to this page.
Do you want more information?
There are still many more actions and parameters that we must take into account to avoid duplicate content on a website. Contact me if you have any questions.