Last updated on March 26, 2026
[Edit: added a seventh problem based on Elaine Zanutto’s comment on LinkedIn.]
[Edit: added a sixth problem based on Chris Chapman’s comment on LinkedIn.]
Five problems in synthetic users created using LLMs:
- compression problem: the synthetic user simplifies human experience too much — whatever data or information you use for generating the synthetic user, it is always a fraction of what actual human complexity, nuance, and dimensionality is. So, you’ll always compress too much relative to the true multi-dimensional human reacting in the same situation.
- extrapolation problem: this is related to the first problem; means that when you present a scenario to the synthetic user for which it lacks underlying data, it goes beyond the primary data and the risk of hallucinated responses increases. This is similar to the out-of-vocabulary problem in NLP (out-of-sample problem more broadly in ML).
- circularity problem: say that you don’t use an inductive approach to create the synthetic users from data but instead you use a deductive approach with some known stances, e.g., “persona is against abortion”. Then you test it, “what do you think about abortion?”. This just mirrors or echoes what you programmed it to say instead of giving any new information.
- unpredictability problem: human experience is unpredictable. We often face reactions and responses in user studies that we did not expect. That’s why studies are done in the first place, because we don’t know what we don’t know. LLM, on the other hand, is either predicable in reproducing its source data or unpredictable in a way that is not based on the cognitive process behind human unpredictability but on some kind of computational randomness that is unrelated to human experience.
- double work problem: many say that synthetic users can be validated with real human data. But if you have such data, why not use it for your analysis in the first place? Why do you need synthetic users? And if you need them for cases where you lack human data, then how can you validate them, given the lack of human data?
- selective attention problem: the summarization of the LLM kind of chooses what aspect to emphasize and that aspect might not always be the optimally relevant one given the query. (Or it might be pulling from different data than a human would—all kinds of omissions could happen this way.)
- statistical problem: traditional statistics, based on random sampling, don’t apply since the sample size can be arbitrarily large and actually there is no real population that you’re sampling from.
Takeaway: Synthetic users are inauthentic users. They could be useful for evaluating systems in an exploratory and limited way when the target user persona is extremely well-defined (e.g., based on some specific impairment that could be mimicked by the system reliably). But they likely fail on evaluating new technologies as their outputs are detached from real human experience, emphasize averages, and lack the serendipity associated with real human user studies.