Key Points
- There is a wide range in quality where LLMs can potentially help in content creation and transformation
- Consider how easy it is to confirm quality across these tasks
- Keep in mind the need to iterate, and how easy it will be to do so
The first thing, in the web content industry, that we focused on with LLMs was (and still is) content generation. This makes sense, since LLMs are generative models. But there are many other places where LLMs have the potential to help us in content transformation. Furthermore, there are different types of ways that LLMs can help us generate content (note: in this article I am steering clear of content analysis and other parts of content operations).
We need to look at three dimensions of the usefulness of LLMs in content creation and transformation:
Raw AI Quality. If we just have an LLM do a content task, how good is it without confirmation?
Ease of Confirmation. How easy is it to confirm the LLM output, and by whom?
Iteration Scaling. How costly is it to scale, especially to iterate across all the content (for instance, if you decide to have a different tone in your documentation, how difficult is it to make that change across all your documentation)?
Raw AI Quality
First, let's consider the raw AI quality, which is the quality that we would expect to get from an LLM in a single shot:
Quality Reference Scale
Starting left to right on the scale above (from lowest to highest quality), what are different tasks and the expected raw quality?
Activity | Definition | Notes |
---|---|---|
Thought Leadership | Bold, expert way of framing a topic. | By definition, LLMs are averaging across existing writing. So it cannot currently do thought leadership. |
Original Content | Some new content. | LLMs *can* write somewhat original content, but we cannot expect the content to be high. |
Repurposing | Take some original content, and repurpose it (usually to a smaller format). | Quality depends on output format |
Rewriting | Significantly rewrite content, for instance for a different voice | Quality depends on the depth and type of rewriting |
Tagging | Tag content, for instance to topic | Although not completely automatic magic, with some controls semantic tagging shows promise in many cases. |
Documentation | Write factual documentation about how something works | LLMs are quite strong at this, especially if they can look at, for example, source code. |
Specification | Write technical specification to modify content, such as an XPath or regex | LLMs are very capable at this |
Ease of Confirmation
LLM outputs can be extremely challenging to confirm. For instance, I once asked an LLM to generate a project calendar — I was thrilled at the results but then I realized it didn't even understand calendars (not the right days of the week, and not all the dates lined up correctly) so had to go back to a more manual approach. As with everything, I'm sure if I needed to generate lots of calendars I could have come up with a strong LLM-assisted approach, but the point remains: the outputs of LLMs can look really, really good on initial scan but miss really important information.
So one of the things we need to consider is the ease of confirmation (for now, we'll steer clear of the question of whether it's worth the time of confirmation vs. just using other approaches). This varies pretty dramatically by role.
AI Content Quality: Expert vs Non-Expert Performance
Some ways that roles affect ease of confirmation:
Subject Matter Expert (SME) at your organization in your organization's topics of focus. The subject matter expert is really the only one who can write thought leadership, although (at a cost) they can help in a variety of other tasks.
Technical Expert (in web technologies). The technical expert can't help confirm many of these types of change, but there's one type of change that only they can confirm: specifications, such as XPath and Regex, or things like template changes.
Content Expert. Content strategists, content designers, and other content experts are best at the actual writing (other than the thought leadership).
Ease of Scaling
One thing I'm always concerned about is scale. There are two dimensions to scale here (also see Dialing in LLM Costs for Website and Content Analysis):
Content count. How many pieces of content are we working with? Note: What's Your Number? Go Wider for Higher Impact.
Iterations. How many times are we going to iterate to get this right? We should not assume that we're going to get everything right on the first pass.
The things that give the highest raw AI quality are also those that are easiest to iterate at scale:
Iteration Scaling