Many descriptions of content inventories seem to be focused on small sites (that can effectively be clicked through to generate a clickmap) or use a tool to quickly scrap together whatever information a tool like Xenu linksleuth can generate. But both approaches focus on only one aspect of content: how site visitors see the content! Obviously, this is a crucial element, but it misses many key aspects that might help inform whatever content decisions you are trying to make (that is the reason to do an inventory, right?). Some useful sources of data for a content inventory are:
- View, or the content as views by the visitor (usually, what they see in their web browser)
- Origin, or information that the source system such as a Content Management System contains, such as the number of versions a piece of content has gone through
- Usage, or how the content is actually used by visitors, for instance whether anyone is actually reading the content at all
- Intent, or how well the content quality (and qualities) match your site intent
This is probably the most obvious one, since it is mostly about the view that all of us can understand: what the content looks like in a web browser! Looking under the hood a bit can be helpful however, since the actual HTML can lend us interesting clues as well (such as the structure which lets us know if an old template is being used for a particular piece of content). Also, there are items like status codes that the server returns and many tools will report back.
So far in the discussion of View, we have glossed over an important point: content can appear in multiple places on one site, and also across multiple channels. This means that even when considering the Views as a source of content inventory data, you may wish to compare how the content appears in multiple views. But regardless this leads to another important point: the origin system is an important source of data for your content inventory, especially in multi-site and multi-channel situations.
Even in simpler cases, the origin system has a lot of useful information. In cases where there is no CMS, using file system data such as creation dates can be helpful in your analysis. In a CMS, other information such as last updated date, first creation date, original author, and topic tags could be useful.
How your content is being used would probably be quite useful in your content inventory. The first obvious information would be pageviews and other straightforward information you might get from analytics packages such as Google Analytics.
In addition,you may wish to consider the social media usage on top of the usage directly on your site, such as information from services like PostRank:
You could have content that is used a lot, but doesn't meet the objectives of your site. So one element to make sure you are considering is whether the content is meeting the objectives, as well as the quality level you expect from the site. As a source, this means more application of judgment than the other sources of content inventory data. Also, you may derive quality measures from the other sources (like checking which content is using old templates). Some aspects under intent to consider are: a) whether standards are being met, b) which of your core objectives the content is meeting, and c) marking content that needs careful editing.
Putting it together
Each of these on their own is interesting, but putting them together in your content inventory can lead to more interesting analysis. For instance, let's say you are about to conduct a content migration. With a content inventory that contains View, Origin, Usage, and Intent, you can ask questions like: What content do I have that is using the old template and never viewed? What content is being used a lot but not meeting our site vision? What content is about rabbits, authored by Joe Blow, and mentioned in social media a lot?