Using AI: Exploring vs. Scaling / David Hobbs Consulting LLC

Key Points

Move beyond just exploratory usage of AIs: look for opportunities to scale
Look for opportunities to turn an exploratory, probabilistic problem into a deterministic one
To really streamline, move away from chatbots as the UI
Streamlining spans multiple levels: generic chatbot → chatbot with MCP → custom chatbot → custom interface with LLMs in the backend → custom interface with no LLMs—shifting from chat to button-press at level 4.

Let's break AI use into two modes:

Exploring. When we don't fully understand the problem and are exploring something we haven't done before.
Scaling. When we now understand the problem well enough to scale in some way, perhaps dramatically restricting how much LLMs play a role.

Chatbots are excellent at helping us explore. Sometimes we make the move from exploring to scaling, and sometimes simple exploration is enough.

Exploring vs. Scaling
	Exploring	Scaling
Primary Interaction Mode	Chat	Probably not chat
LLM Models	Frontier	Small
Expertise Required	High	Low
How Deterministic	Low	High
Speed	Slow	Fast
How much are LLMs used	Pervasively	None or very focused
Scale	One-offs	Repeatable
Style	Verbose	Focused
Range of the the interaction	Open-ended	Constrained

Why we might want to scale

Sometimes exploration is enough, but sometimes we need to scale:

More data. Let's say we burn through a ton of LLM credits to do some ad hoc analysis on a sample of a section of our website. What happens when you want to extend the analysis to the entire website (or your entire digital presence)? You need to scale.
More repeatability. Sometimes we want to do some sort of work in a manner that can be compared. For instance, we might want to compare over time or compare across different domains. To do this, you need to have a way to reliably get similar and comparable results repeatedly.
More speed. Using LLMs is slow. Of course sometimes we give it tasks that can't reasonably be done any other way (than manual) so it's worth it, but LLMs are, regardless, slow. You can get more speed by using smaller models but also by avoiding LLMs entirely.
More observability. As we all know, no one deeply understands exactly how LLMs work. Sometimes you might want more insight into exactly how calculations are made.
Less expensive. Especially for frontier models, tokens are expensive and it looks like they will only get more expensive. Sometimes you can move a task completely outside of LLMs, but, even if not, you might be able to move to less expensive (or free) models.
Wider usage. Chatbots make things up and don't let you know that they did that. This can be helpful (and perhaps not problematic) when exploring, but if you want more people (and people with a wider range of AI sophistication) then you may need a completely different type of UI.

When it comes to the UI for interacting with LLMs, we tend to think about chatbot interfaces. But as we scale we are probably going to be interacting with LLMs by anything but chat.

Differences in the UI

It's tricky to capture a screenshot of a useful interaction with a chatbot — they are so verbose! That said, here is an example brief snippet working with Claude with an MCP connection to Content Chimera. We can see that the style is very conversational and we have just sampled some pages (although the conversation did continue leveraging Chimera to evaluate all pages).

Exploring with a chatbot: a conversational, verbose session with Claude connected to Content Chimera via MCP, sampling a handful of pages.

Want to try this? Contact Us.

You can connect Content Chimera to your chatbot for exploratory discussions about your digital presence.

An analysis workflow is more set-and-forget as well as more reliable. In the following example, we see the end result (the green bar at the top shows all the steps it went through) of the workflow (which crawled, scraped, ran focused LLMs, added calculated fields, and generated the summary view) without any user intervention. Then it at the bottom the user can switch back to an exploratory mode (where the user asked about why one value was not present). This process was started by clicking a button, and the user can then forget about it until they get an email saying the analysis is complete.

Scaling with a workflow: a single button kicks off crawling, scraping, focused LLM passes, and summary generation—then the user drops back into chat to ask why one value is missing.

Try this streamlined flow

Create a free Content Chimera account, enter the URL, and wait for a first pass at Nielsen usability heuristics evaluation for the first thousand pages.

A major difference in this cases is user involvement and technical knowledge requirements:


	Exploring (Mailchimp styleguide example)	Scaling (Nielsen heuristics example)
User engagement of total time	99%	1%
User knowledge requirements	High	Low
How started	Chat	Clicking a button

The transition from Exploring to Scaling

Aside from trivial cases, you do need to explore first to understand the problem. For example, in the Scaling example above there was an entire exploratory chat session to understand the problem and then figure out how to streamline.

The most solid way to move from Exploring to Scaling is converting a probabilistic problem to a deterministic problem. For a review:

LLMs are, by definition, probabilistic. Except for trivial cases, this means that you don't get the same result every time: probabilities specifically drive how the algorithm works.
Traditional algorithms are deterministic: with the same inputs you get will always get the same outputs.

Fortunately, LLMs are actually quite good at writing traditional code (and for finding ways of converting problems to a deterministic one in general), so LLMs can help you convert a problem to a traditional one.

Levels of streamlining

One of the core elements of scaling is streamlining the experience of achieving whatever task we are undertaking. There are different levels. I'll use Content Chimera examples below.


	UI	Truth Grounding	Hands-on User Effort
1. Normal, generic chatbot usage	Chatbot	Low (except for very small data sets)	High
2. Generic chatbot with MCP connection	Chatbot	Some	High
3. Custom chatbot	Chatbot	Some more	High
4. Custom interface, with LLMs in the backend	NOT chatbot	High	Low
5. Custom interface, with no LLMs in backend	NOT chatbot	Very High	Low

1. Normal, generic chatbot usage

This is, in many ways, the most powerful: ask a generic chatbot a question and it will do its best to answer. It might make things up, it might try to do something insecure, but it will probably push through to give you an answer.

Level 1: A normal, generic chatbot—powerful and willing to answer anything, but weak on grounding.

2. Generic chatbot with MCP connection

This is still exploring, but it's leveraging an MCP connection for better grounding and perhaps some broader data.

Level 2: A generic chatbot with an MCP connection for better grounding and broader data.

3. Custom chatbot

Custom chatbots are still exploratory, but they can steer the conversation in the relevant way for the task at hand. In addition, they can show specific feedback — for instance, in this example we see:

Inline, interactive chart
Specific tools for the task at hand (in this case, showing how many rules and dispositions there are, the total current manual effort, etc)

Level 3: A custom chatbot that steers the conversation and surfaces task-specific feedback, like an inline interactive chart and effort metrics.

Want to try Chimera Chat? Contact Us.

4. Custom interface, with LLMs in the backend

This is when the big switch happens. Here, we divorce LLM-as-UI from LLM-as-backend. The earlier example of the Nielsen heuristics was a custom interface with LLMs used narrowly, which narrowly uses LLMs for some tasks but only where a more traditional approach will not work. From the UI perspective, getting the evaluation for all the content is pushing a button, moving on to other tasks, and then coming back when all the tasks are complete.

Level 4: A custom interface with LLMs narrowly in the backend—run the full evaluation by pushing a button and walking away.

Try this streamlined flow

Create a free Content Chimera account, enter the URL, and wait for a first pass at Nielsen usability heuristics evaluation for the first thousand pages.

Here is another example, where an email is sent during a crawl where an LLM attempts to see if a crawl is spiraling out of control (on top of deterministic mechanisms already in place for this).

Level 4 in action: an LLM watches for a crawl spiraling out of control and sends an alert, backing up the deterministic safeguards already in place.

Want to try this? Contact Us.

Got a challenging or huge digital presence you need to analyze? Want to see spiral detection in action?

5. Custom interface, with no LLMs in the backend

We come full circle here, to workflows that do not need LLMs at all. Sometimes we do not even need LLMs to explore the problem to determine how to define a data processing pipeline. A huge advantage of this approach is that it is deterministic and very streamlined.

Level 5: A fully deterministic, custom interface with no LLMs in the backend at all.