How far can automation and AI go in supporting psychotherapy?
The assessment of artificial intelligence’s role in mental health is no longer a futuristic discussion. It’s happening right now, in laboratories, clinics, and even across crisis lines throughout the United States. Psychotherapy has always been a deeply human process: someone talks, someone listens, and something shifts in between. But large language models, known as LLMs, are entering this space faster than most people expected, and the big question hanging in the air is no longer whether AI will change therapy.
The real question is different: how much of this experience can be automated without losing the essence of care along the way? That’s exactly what a group of researchers at the University of Utah set out to investigate. The result is a study called A Framework for Automation in Psychotherapy, published in the journal Current Directions in Psychological Science, which presents a very practical framework for understanding the different levels of automation possible within psychotherapy. 🧠 Spoiler: there’s a lot more nuance to this story than a simple replace or don’t replace.
Who’s behind the research
The study was born from an interdisciplinary collaboration spanning three major areas at the University of Utah. The lead author is Zac Imel, a professor of educational psychology in the College of Education and co-founder of Lyssn, a tech company based in Seattle that develops AI-powered quality improvement programs for behavioral health services. Alongside him are Vivek Srikumar, an associate professor at the Kahlert School of Computing within the College of Engineering, and Brent Kious, an associate professor of psychiatry in the School of Medicine.
This combination of expertise in computing, psychiatry, and educational psychology is what gives the study such a practical edge. It’s not a purely theoretical paper about AI’s potential, nor an isolated technical demo. It’s an effort to create a common language that allows healthcare professionals, technology developers, and policymakers to talk about automation in therapy using the same terms and the same risk benchmarks.
As Imel himself put it bluntly: the history of new technologies like this almost always involves collaboration and revolves around how technology supports the human expert in the work they’re already doing. The goal of the study isn’t to speculate about robot therapists. It’s to map out, in a structured way, the different types of work that can be performed through automation within therapy.
The framework that breaks therapy into layers of automation
The University of Utah study proposed something that looks simple on the surface but carries enormous implications in practice: instead of treating therapy as a monolithic block, the researchers broke the therapeutic process into distinct components, each with a different level of feasibility for automation. This shifts the conversation significantly, because it moves the debate away from all-or-nothing territory and into much more technical and functional ground.
To make it easier to understand, Srikumar uses a pretty accessible analogy: self-driving cars. The automotive industry has been rolling out driver-assistance systems for years, and the far end of that spectrum is the fully autonomous vehicle. In psychotherapy, the logic is the same. The most extreme version would be a fully artificial therapist, but between the current landscape and that extreme, there are several intermediate steps, each with different capabilities, benefits, and risks.
The team outlined four categories representing these different levels along a continuous spectrum:
- Category A — Scripted systems: the content is written by humans but delivered to patients by chatbots following decision trees. Think coping tips and structured exercises the system serves up based on predetermined responses.
- Category B — AI evaluates therapists: artificial intelligence analyzes therapy sessions and provides feedback or ratings on the clinician’s performance. The focus here is on improving the quality of care.
- Category C — AI assists therapists: AI suggests interventions, prompts, or phrase formulations, but the human therapist is still the one delivering care. It’s a clinical copilot, not the pilot.
- Category D — AI delivers therapy directly: an autonomous agent generates responses and interacts with patients, possibly with some level of human oversight. This is the highest-risk level and the one that raises the most ethical questions.
What makes this framework particularly interesting is that it doesn’t try to shove AI into therapy at any cost. On the contrary, it starts from a careful assessment of where technology genuinely adds value and where it still faces serious limitations. The team evaluated each category for its potential usefulness and associated risk levels, which vary enormously. A scripted chatbot, an AI coaching tool for therapists, and a fully autonomous artificial therapist are fundamentally different technologies with fundamentally different risks. But, as the researchers point out, it often isn’t clear to users, or even to healthcare systems, which of these technologies is actually being deployed.
Risks, consent, and accountability at every level
One of the study’s most important points is how the same ethical questions take on completely different flavors depending on the level of automation in question. Srikumar explains that by cataloging the various levels, questions about risk, consent, impact of potential errors, and accountability of the parties involved remain the same in essence, but their consequences change dramatically as you move along the spectrum.
In Category A, for example, the risk is relatively low. The content was created by human experts, and the chatbot is just a delivery vehicle. If something goes wrong, the chain of responsibility is traceable. In Category D, however, where AI generates responses autonomously and interacts directly with patients who may be in situations of extreme vulnerability, the implications of an error are on a completely different scale. Who is responsible when a language model fabricates information during a conversation with someone in crisis? How does informed consent work when the patient doesn’t know exactly what type of system is on the other side of the screen?
These aren’t rhetorical questions. They’re concrete challenges that any serious implementation of AI in mental health needs to confront. And the framework’s merit is precisely in forcing this differentiation, preventing all automation in therapy from being treated as if it were the same thing.
Artificial intelligence as support, not a substitute
One of the most important conclusions to emerge from the research is that the most promising model isn’t artificial intelligence operating autonomously within therapy, but rather a structured collaboration between technology and the human professional. In practice, this means AI steps in to expand the therapist’s reach and efficiency, not to take their place.
Imel is particularly emphatic about automation’s potential in the area of therapist evaluation and training. Evaluating a psychotherapy session is an incredibly labor-intensive, slow, and unreliable process, and it rarely happens in everyday clinical practice. Nobody is recording their sessions and sending them to an outside expert who will listen, evaluate, provide feedback, and send it back so the therapist can learn from it. This is where properly trained LLMs can quickly capture the core components of treatment and feed that information back to therapists, often in real time.
Another use case showing strong potential is between-session support. Most people in therapy see their therapist once a week at most. The rest of the week goes by without structured support, and that’s exactly when many emotional triggers show up. AI-based apps that can offer emotional regulation exercises, mood tracking, cognitive restructuring techniques, and even basic empathic listening during those gaps can make a real difference in the continuity of the therapeutic process. It’s not therapy, but it is support, and well-calibrated support has measurable clinical value, especially when integrated into a treatment plan supervised by a professional.
The real-world application: the SafeUT case
The study doesn’t stay purely in the conceptual realm. The team is already putting parts of the framework into practice through a partnership with SafeUT, Utah’s text-based crisis line. Kious explained that the goal of this collaboration is to develop tools that help evaluate crisis counselors’ sessions so they can receive feedback that allows them to maintain key skills and even develop new ones as more is learned about crisis counseling.
This is an application that fits squarely into Categories B and C of the framework. The AI isn’t talking to the patient in crisis. It’s analyzing how the human counselor handled the conversation and providing insights to improve future care. It’s a relatively low-risk use with high impact potential, exactly the type of automation the researchers advocate as a starting point.
Srikumar also sees a broader future role for AI in crisis lines. He describes that environment as extremely challenging: you know nothing about the person on the other end, they reach out and the counselor may have only five or six conversational turns to connect, help, and reduce risk. What he envisions is that future crisis counseling systems will be heavily augmented by AI, because the scale of demand is simply too large to be met without automation. 🎯
The risks of ChatGPT as an improvised therapist
The researchers raise an important warning that deserves attention: anyone can, right now, turn to ChatGPT or another language model looking for advice that resembles psychotherapy. LLMs are designed to be engaging and sound empathetic, and they’re trained on massive datasets. But that doesn’t mean they use evidence-based psychotherapy techniques.
In fact, these models carry significant risks. They’re known for fabricating information, encoding biases present in their training data, and responding unpredictably. When the context is mental health, where words can carry enormous weight in someone’s life, these problems stop being minor technical glitches and become patient safety issues.
Srikumar frames the issue pragmatically: why would anyone choose to deploy the riskiest version of a tool when there are so many lighter versions that can already be implemented and that will make clinicians’ lives easier? A note-taking app, for example, something that keeps organized records throughout a session, already improves quality of life for clinicians and the quality of the service they provide. The temptation to jump straight to the most advanced level of automation needs to be resisted in favor of incremental and responsible approaches.
What the data says about effectiveness and limitations
Beyond the conceptual framework, the broader research landscape on interventions using language models and artificial intelligence tools in mental health settings shows mixed but revealing results. In populations with limited access to mental health services, such as people in rural areas, lower-income communities, or individuals facing mobility barriers, AI-based apps have shown positive results on metrics like reduction in mild anxiety symptoms, adherence to mindfulness techniques, and engagement with cognitive-behavioral therapy exercises. In these contexts, the alternative often isn’t a human therapist but rather no support at all, which places automation in a legitimate and necessary role.
On the other hand, in populations dealing with more complex conditions, such as personality disorders, severe trauma, or active suicidal ideation, human presence remains irreplaceable. Trying to scale care through AI in these cases can create a false sense of support without the clinical substance required. Real-time clinical risk assessment is still a domain where humans need to be in full control, and any responsible AI system needs clear escalation protocols to professionals when signs of risk appear in the conversation.
The future is hybrid, not binary
What becomes clear after looking at the full scope of the study and the available evidence is that artificial intelligence has a real and growing role in expanding access to mental health care, but that role needs to be built on a solid foundation of research, clinical ethics, and respect for human complexity.
The point the researchers emphasize strongly is that this integration needs to happen thoughtfully and with ongoing evaluation of outcomes. Dropping a chatbot into an app and calling it mental health support without any clinical protocol behind it isn’t just ineffective — it can be harmful. Automation in therapy only works when it’s designed with clear clinical intent, validated by data, and closely monitored. The enthusiasm for the technology is understandable, but responsibility to the patient always has to come first.
Today’s language models are powerful tools, but tools nonetheless. And like any tool, what matters isn’t just what it can do, but how, when, and by whom it’s used. The future of therapy probably isn’t human or artificial. It’s a smart combination of both, each contributing what it does best. 💡
The full study is available in the April issue of Current Directions in Psychological Science and includes co-authors from institutions such as the University of Washington, the University of Pennsylvania, and the Alan Turing Institute.
