Especially for large sites, one of the reasons to do a content inventory is to understand your content. Another way of looking at it is that you want to explore your content. Often a tool like Xenu Linksleuth or OmniOutliner is used to create an inventory, and then teams are left to manually prod at the content. This prodding can be done by working in a spreadsheet (for example, filtering columns or generating graphs by mime type) or by clicking around on the site.
But what if your inventory allowed true, meandering exploration through the partially-unknown world of your content?
What needs to be in place to allow content inventory exploration:
- Search for patterns within raw pages. Sure, your site might pull information from multiple systems, but you need to be able to analyze what your users are really seeing. For example, how many pages of my site use the current template?
- Easily add information to the inventory. If you have a new source of data, you need to be able to add it to your inventory. For example, if you get physical store sales information that is relevant to your online product pages, then you should be able to quickly integrate that. Or you want to combine a link checker report and Google Analytics? It should be easy.
- Iterate quickly. Instead of being a one-off activity, the explored content inventory needs to be able to do all of the above very quickly. For instance, if you dream up a new question about a pattern on your site (how many pages have an embed tag?), you should be able to do that quickly (this also means that re-spidering the whole site isn't acceptable for these quick tests).
- Continue all of the above over a long time period. Exploration isn't a quick hit activity, and neither should exploring your content be. Unless you have a small site, understanding your content is complicated. You need to try things out, think about them, and try again.
- Easy to re-integrate work from multiple groups. A classic issue with inventories is having all these dreaded spreadsheets flowing around different teams and trying to keep them up to date. For example, you should be able to pass out information to multiple groups, and then they could highlight the content that needs a high level of manual reworking. This is related to #2 above, but a bit different: the point here is updating information in addition to adding new information.
Why bother this exploration rather than just looking at a simple tool's output? The biggest reason: You don't even know the questions to ask about your content when you start, so structuring and thinking about the undertaking as an exploration allows you to start earlier and take your time. In addition, this deeper understanding of the complexity and exceptions allows you to better plan your undertaking, for example a content migration. For example, nominally you many think all the pages of a subsite are using a particular template, but it would be better to check that rather than being surprised during the migration.
As with the question of how much automation to undertake in the migration, of course there is also a tradeoff here: for huge sites, this sort of analysis can probably be quite useful. For small sites, it probably isn't worth it. In between is less clear. That said, at a minimum for large sites, you can use this type of information to inform many decisions such as information architecture design and migration planning. In migrations, you can explore what rules could be used in your migration.