OpenAI Has to Tell ChatGPT to Stop Talking About Goblins
OpenAI found itself in a pretty unusual situation recently: ChatGPT started mentioning goblins and gremlins in everyday responses, with absolutely no context to justify it.
It sounded like a joke, but it wasn’t.
The behavior kept getting more and more frequent until it caught the attention of users and even employees at the company itself, who had to investigate what was going on behind the scenes with the model. In a post published on the official blog on Thursday, the company laid out how the problem started, evolved, and was finally fixed.
What the investigation uncovered turned out to be way more interesting than mythical creatures randomly popping up in conversations about work or code. OpenAI discovered that a nerdy personality developed internally for ChatGPT had been inadvertently incentivized to reward mentions of goblins during the training process. In other words, the model was literally being rewarded for talking about these creatures, even when it made absolutely zero sense.
This episode became a real, concrete example of the challenges any company faces when developing artificial intelligence systems at scale, where even a small detail in the training process can propagate in completely unexpected ways.
And that’s exactly what happened here. 👀
What Was Behind ChatGPT’s Strange Responses
When the first reports started coming in, a lot of people assumed it was just a temporary bug or maybe even a prank planted by OpenAI itself. After all, seeing an artificial intelligence model drop goblins into a response about Excel spreadsheets or describe technical issues as little goblins causing trouble is, to say the least, baffling. But the situation was way more serious than it appeared at first glance, and the technical team had to act quickly to understand where the problem was coming from before it spread even further.
According to OpenAI’s blog post, the company first noticed an uptick in mentions of goblins, gremlins, and other creatures after the launch of GPT-5.1 in November. Users started complaining that the model was being oddly too informal in conversations, which led the company to open an investigation into specific verbal tics the model had developed.
The turning point came when a researcher at OpenAI, who had already noticed some goblin mentions here and there, asked the team to dig deeper. What the developers found was surprising: the appearances of the term in ChatGPT responses had increased by 175% since the launch of GPT-5.1. This wasn’t an isolated issue or a coincidence — it was a clear, measurable trend that was growing over time.
The internal investigation pointed to the process by which models like ChatGPT learn to give better responses based on evaluations and adjustments made during training. The problem is that this process, when poorly calibrated or when it receives inconsistent signals, can create behavioral patterns that seem completely random to anyone on the outside but follow a very specific internal logic within the model. In this case, the nerdy personality developed for ChatGPT was using goblins and gremlins as metaphors to describe problems and bugs, and the reward system ended up reinforcing this behavior instead of correcting it.
OpenAI took practical steps to fix the issue, including directly instructing Codex, its coding agent, not to reference goblins unless it was genuinely relevant to the context of the conversation. It’s the kind of fix that sounds absurd when you say it out loud — one of the biggest tech companies in the world literally had to tell its AI model to stop talking about mythical creatures — but it reflects a very real reality of developing complex artificial intelligence systems.
GPT-5 and the Pressure for Increasingly Stable Models
This episode arrived at a pretty delicate moment for OpenAI, which is in the middle of the development and gradual rollout of GPT-5, its most advanced model to date. The company had already identified the problem specifically in tools powered by GPT-5, meaning the newest and most powerful model was precisely the one spreading goblin references most frequently. Expectations around GPT-5 are enormous, and any erratic behavior ends up being amplified by the community as a sign that artificial intelligence systems still have a long way to go.
What makes this scenario even more interesting is that GPT-5 was built with a significantly more complex architecture than its predecessors, which should theoretically make it more robust against this type of behavioral drift. OpenAI stated in technical communications that the new model went through more rigorous evaluation processes, with additional layers of verification before any update was released to the public. Yet the goblin episode happened with GPT-5.1 specifically, showing that even state-of-the-art models remain vulnerable to this kind of problem when the refinement process isn’t closely monitored.
The most important lesson this case brings to the development of any large-scale artificial intelligence system is that behavioral stability isn’t a property you set once and forget about. It needs to be monitored continuously, with evaluation processes that can identify deviations before they reach the end user. And this is especially difficult when you’re dealing with a model that handles billions of interactions per day, in completely different contexts, for people with very distinct expectations and needs.
The Problem of Linguistic Tics in AI Models
The goblin case isn’t an isolated event in the world of large language models. Anyone who follows the development of artificial intelligence tools knows that verbal tics are a recurring problem. Previous models from OpenAI and other companies have exhibited similar behaviors, like the overuse of certain words, stock phrases, or expressions that seemed to come out of nowhere and started appearing in responses at a disproportionate rate.
The difference is that in the goblin case, the problem was so visual and unusual that it grabbed attention in a way a subtler tic never would. When ChatGPT describes a code bug as a little goblin causing problems, the user immediately realizes something is off. But when the model simply uses a word that’s a bit too formal or repeats a sentence structure too often, most people don’t even notice. This raises a relevant question: how many other, less obvious linguistic tics might be present in the models we use every day that nobody has spotted yet?
OpenAI acknowledged in the post that this episode highlights the challenges artificial intelligence companies face when dealing with the potential for systems and their training processes to reward and reinforce errors as language quirks. It’s an honest admission that, even with all the technological sophistication available, there are still significant blind spots in the development process of these models.
What This Case Reveals About the Limits of AI Today
More than just a viral curiosity, the goblin episode in ChatGPT exposes something that artificial intelligence researchers have been discussing for years: the difficulty of ensuring a language model behaves predictably in every possible scenario. Models like ChatGPT are trained on massive volumes of data and go through extremely sophisticated fine-tuning processes, but they still carry a certain unpredictability that’s inherent to the way these systems learn. To this day, there’s no way to fully inspect the internals of a large model and understand exactly why it made a particular decision at a particular moment.
This phenomenon is known in the field as the black box problem, and it’s one of the main focuses of model interpretability research. The idea is to develop tools and techniques that allow engineers to better understand what’s happening inside these massive neural networks, so that problems like the goblin behavior can be identified and fixed well before they reach the user. OpenAI has an entire team dedicated to this kind of research, but progress is still gradual and the problem remains genuinely hard to solve in practice.
It’s also worth mentioning the impact this kind of situation has on public trust. When someone uses ChatGPT for work and gets a response full of references to mythical creatures for no apparent reason, the natural reaction is to question the reliability of the tool as a whole. If the model gets something so basic so bizarrely wrong, how can you trust it for more complex tasks? That perception, even if it’s not entirely fair from a technical standpoint, is real and has practical consequences for the adoption of these tools in professional and corporate environments.
The Importance of Transparency in AI Development
What’s positive about this whole story is that OpenAI identified the problem, investigated it seriously, published a detailed post explaining what happened, and corrected the behavior. This shows that the monitoring processes exist and work, even if they’re not perfect. And for anyone closely following the development of artificial intelligence, this kind of transparency, even if partial, is welcome.
Knowing that a company the size of OpenAI takes seriously even a seemingly harmless behavior like mentioning mythical creatures in a response about code says a lot about the level of attention they dedicate to the quality and reliability of their models. The blog post didn’t just acknowledge the mistake — it also explained the chain of events that led to the problem, from the model’s nerdy personality to the reward system that inadvertently reinforced the goblin mentions.
This kind of open communication also helps the technical community learn from mistakes and develop better practices to prevent similar problems from happening in other models and platforms. When a company shares what went wrong and how they fixed it, everyone benefits.
At the end of the day, the goblin episode is a lighthearted but revealing reminder that artificial intelligence systems are still deeply dependent on the quality of the human processes that shape them. All the algorithmic sophistication in the world doesn’t replace the need for constant monitoring, careful investigation, and quick fixes when something goes off the rails. And if the price of this lesson was a few weird responses about mythical creatures, we can probably say we got off pretty easy on this one. 🤖
