Ditch the spreadsheet for content analysis.

What is Problem Duplication?

Key Points

  • Problem duplication is when content copies either 1) confuse using your information or 2) obscure the ideal content.
  • Don't be too narrow in thinking about duplication — *exact* copies are not as serious as other kinds of copies.
  • Close copies are the worst kind, since it may be difficult to determine which is the best version.
Related resource
Problem Content Duplication | Problem duplication, its causes, and how to fix it

What is problem duplication?

Problem duplication is when copies either 1) cause confusion in using your information or 2) obscure the ideal content with a copy.

In a sense, I think we should only care about duplication that is a problem, and expand our usage of the term “duplication” to include any copy that causes users problems (even if it isn’t an absolute clone). As with many problems, we can actually trip ourselves up by sticking too closely to narrow meanings. For instance, consider the following dialog (I have been a party to conversations like this):

  • Mary: Visitors are being confused because of the duplicate content: the same content is on our Country and Product pages.

  • Gertrude: Oh, those pieces of content actually aren’t duplicates. They are slightly different content.

  • [ end of story, as if the fact that they aren’t carbon copy duplicates means the two copies are not a problem. ]

So let’s think broadly about duplication for our end users’ sake (and, as discussed below, close copies are actually a major problem). 

What FORMS does duplication take?

Duplication takes many forms. Although implementation details can help to mitigate the impact, this is a rough scale (from left to right: least severe to most severe) of how problematic different forms of duplication are:

Minimize the number of copies and contextualize any that remain.

  • Single copy. Having a single copy (including a single version of the content in a single format with a single landing page) is the most straightforward for visitors. There is no duplication.

  • Bound versions & formats. In this case, there are different versions and/or formats, but there is a landing page that clearly gives context around this.

  • Exact copy. There are exact copies (from format to versions) of content, but they are not explicitly bound together. On its own, this isn’t necessarily a huge issue, but it is more confusing for users should they see the content in multiple contexts (for instance, from different repositories in federated search results).

  • Far copies. These are copies that are very different, for instance, a copy of content on a particular issue that are separated by years. This could also occur if different teams have widely different views on the same issue, and each publish content on the same things (but with different perspectives). The advantage of far copies is that the visitor can more easily gauge which copy is of interest (assuming, of course, the visitor can see the copies).

  • Close copies. If the versions or format copies are not bound together (see above), then close copies can be particularly difficult for site visitors to navigate.

  • No correct copy. Obviously, the worst case is that there are copies, but none of them are correct (for instance, up to date with current law).

WHERE do users experience duplication?

The above describes what types of duplication exist in the system. But we do not care solely about duplication in the abstract. We care about how intranet and website users experience duplication, partially to simply understand the problem more but also so that we can utilize this information to reduce the impact of duplication (some of which is necessary) on users.

Intranet and website users experience duplication in the following ways:

  • Searching

  • Browsing

  • News/activity feeds

  • Blocks aggregating content (such as lists)

  • Topical and other aggregation pages

Problem Content Duplication Problem duplication, its causes, and how to fix it