Share:

InsightFinder raises $15 million to understand where AI agents go wrong

The observability market is going through yet another major transformation. For a long time, the logic was to track everything at any cost. Today, the focus has shifted to controlling complexity and cutting spend without losing visibility. In the middle of this shift, a new type of workload entered the game: AI agents running in production, at scale, inside massive enterprises. And that is exactly where InsightFinder wants to shine.

The startup, built on more than 15 years of academic research, has been using machine learning since 2016 to monitor, identify, and prevent problems in IT infrastructure. Now it is applying that experience directly to the most sensitive point of this new era: the reliability of AI models and agents in real-world environments, where money and reputation are on the line.

Led by founder and CEO Helen Gu, a computer science professor at North Carolina State University and former IBM and Google engineer, the company has just raised $15 million in a Series B round led by Yu Galaxy. That brings total funding to roughly $35 million, according to the company. The focus now is simple and straightforward: scale sales, marketing, and global expansion without losing the technical edge that built its brand.

From classic monitoring to observability for AI

In recent years, the way people talk about observability has changed. The mandate used to be: collect everything. Logs, metrics, traces, events, anything that might someday help investigate an incident. This led to a data explosion, high costs, and a sea of dashboards that were hard to interpret.

With the massive arrival of AI models and intelligent agents in production, the challenge moved beyond pure infrastructure. It is no longer enough to know whether the server is up or whether the API is responding. You need to understand how:

  • data flows through the systems;
  • models behave over time;
  • the infrastructure supports all of that;
  • and how those three layers interact.

Helen Gu sums it up well: to diagnose problems in AI models, you cannot monitor each piece in isolation. You need to observe data, model, and infrastructure together. In many cases, it is not just a model bug or a dirty dataset – it is a mix of both, combined with details around cache, networking, orchestration, or storage.

A practical example: fraud, model drift, and stale cache

A real case cited by Gu shows how this plays out day to day. A large credit card issuer in the United States saw one of its fraud detection models start to show drift, meaning performance began to move away from the expected baseline.

Receive the best innovation content in your email.

All the news, tips, trends, and resources you're looking for, delivered to your inbox.

By subscribing to the newsletter, you agree to receive communications from Método Viral. We are committed to always protecting and respecting your privacy.

In a typical scenario, suspicion would first fall on the model or the training data. But because InsightFinder was monitoring the entire infrastructure and the surrounding environment, it was able to find the true source: stale cache on some server nodes. It was not exactly an error in the model logic or in the main dataset, but rather an infrastructure problem that manifested itself as an AI failure.

This kind of end-to-end correlation is one of the company’s core value propositions: do not treat AI observability as simple model metric checks, but as a full-stack analysis, from the incoming data, through the model, all the way to the final service that delivers value to the user.

Beyond testing and evaluation: AI in production is a different game

Many people still see AI observability as a synonym for LLM evaluation in development and test environments. Measuring response quality, tuning prompts, running benchmarks. That is important, of course, but it is only part of the story.

For Helen Gu, the market’s biggest misunderstanding is exactly there. A truly useful AI observability platform needs to offer an end-to-end feedback loop that covers:

  • the development phase;
  • the evaluation stage;
  • and, most importantly, production.

It is in day-to-day operations, with real users, fluctuating volumes, external dependencies, and data that is constantly changing, that the most critical problems emerge. Performance issues, model drift, behavior changes due to silent API updates, GPU bottlenecks, latency spikes in specific microservices – all of this only becomes visible with continuous observability.

Autonomous Reliability Insights: InsightFinder’s new bet

To tackle this problem, the company launched a product called Autonomous Reliability Insights. The idea is to deliver an intelligent analysis layer that goes beyond just alerting that something went wrong. The goal is to get close to pointing out:

  • what is failing;
  • why it is failing;
  • and how to keep it from happening again.

According to Gu, the system combines:

  • unsupervised machine learning to detect anomalies without relying solely on manual rules;
  • proprietary language models, both large and small, integrated into the analysis;
  • predictive AI to anticipate issues before they blow up;
  • causal inference to distinguish correlation from root cause.

One important detail is that this layer is data agnostic. The platform can ingest and analyze full data streams coming from multiple sources, correlating technical and business signals. That way, it is not locked into any specific log or metric format, which is useful in hybrid, multicloud environments full of legacy integrations.

A crowded market – and why InsightFinder believes there is room

The observability space is anything but empty. On the contrary, InsightFinder goes up against giants and well-known players such as:

  • Grafana Labs;
  • Fiddler;
  • Datadog;
  • Dynatrace;
  • New Relic;
  • BigPanda.

All of them, in one way or another, are adding AI-focused capabilities, whether measuring model performance or expanding traditional monitoring platforms to cover machine learning and LLM workloads.

Even so, Gu says InsightFinder has a moat built on three pillars: specialization, experience, and customization. In her words, the company rarely loses customers to direct competitors. The reason, according to the CEO, lies at the intersection of two worlds:

  • many data scientists understand AI but do not master complex system architecture and operations;
  • many SRE engineers and developers understand systems but lack depth in AI.

InsightFinder tries to cover exactly that gap, focusing on the intrinsic relationships between these two sides. The kind of observability the company advocates is not just about models or just about infra, but about the complete stack, from the data pipeline to the behavior of intelligent agents in production.

Big-name customers and lessons learned with Fortune 50

Today, InsightFinder’s customer base includes heavyweight names such as:

  • UBS;
  • NBCUniversal;
  • Lenovo;
  • Dell;
  • Google Cloud;
  • Comcast.

Gu credits part of this success to almost a decade of work side by side with large enterprises, especially Fortune 50 members. One example she mentions is the partnership with Dell, which has been helping distribute and deploy InsightFinder’s systems at some of Dell’s largest global customers.

These environments are anything but trivial. Multiple regions, different clouds, on-prem solutions, strict security requirements, teams spread across the globe. According to Gu, it was in this context that the company refined its models and figured out what is truly needed to run AI reliably at enterprise scale. It is not just about taking a base model, throwing machine data at it, and hoping it works.

Growth numbers and the post–Series B phase

On the financial side, InsightFinder says revenue is strong, with growth of more than 3x in the last year. One interesting detail: the company claims it was not actively seeking a new Series B round. According to Gu, investors approached them after the startup closed a seven-figure deal with a Fortune 50 company in about three months.

With the new $15 million, the priority is to move the operation out of a mostly technical phase and lean harder into sales and marketing. Until now, InsightFinder’s team had fewer than 30 people, heavily focused on product, research, and deployment.

Tools we use daily

The plan is to keep that technical base but invest in:

  • building a sales team specialized in enterprise accounts;
  • marketing aimed at educating the market about end-to-end AI observability;
  • more structured go-to-market strategies, including global partners.

In short, the company wants to turn technical traction into consistent commercial growth without watering down its focus on reliability and deep analysis.

Why all of this matters if you work with AI today

For anyone building, operating, or integrating AI solutions in real-world environments, what is happening around InsightFinder is a good barometer. It shows that the AI conversation moved past the demo hype a while ago and into a phase where the central question is: how do we keep all of this stable, reliable, and under control?

When AI agents become part of critical routines – such as fraud detection, product recommendations, automated support, risk analysis, security, or process orchestration – any error stops being just an interesting bug. It turns into:

  • direct financial risk;
  • brand damage;
  • hits to user experience;
  • and, in some sectors, even regulatory trouble.

In this context, solutions that combine deep observability, machine learning applied to telemetry, and the ability to explain the root cause of incidents become strategically important. It is not just about monitoring servers anymore, but understanding where and why an AI agent behaves incorrectly – and fixing that before it creates damage.

The message behind the $15 million raised by InsightFinder is pretty clear: in the next phase of AI adoption, the winners will not be just the ones with powerful models, but the ones who can keep those models running well, at scale, with predictability and transparency.

And that is exactly where InsightFinder is placing its bets, at the intersection of model, data, and infrastructure, aiming to establish itself as one of the global references in observability for artificial intelligence in production.

Picture of Rafael

Rafael

Operations

I transform internal processes into delivery machines — ensuring that every Viral Method client receives premium service and real results.

Fill out the form and our team will contact you within 24 hours.

Related publications

Alteryx: AI and Automation in 380 Million Workflows

Automation at Scale: How Alteryx Processes 380M Workflows/Year, Combining AI, Data Governance & Enterprise Productivity

High-Performance Computing, Artificial Intelligence, Automation, and Digital Marketing in Digital Transformation

AI drives automation, personalized marketing and high-performance computing to transform operations, experiences and decision-making.

Artificial Intelligence and Automation Drive Investments

Investment in AI and automation becomes a priority in the American Midwest: companies move from pilots to practical implementations to

Receba o melhor conteúdo de inovação em seu e-mail

Todas as notícias, dicas, tendências e recursos que você procura entregues na sua caixa de entrada.

Ao assinar a newsletter, você concorda em receber comunicações da Método Viral. A gente se compromete a sempre proteger e respeitar sua privacidade.

Rafael

Online

Atendimento

Website Pricing Calculator

Find out how much the ideal website for your business costs

Website Pages

How many pages do you need?

Drag to select from 1 to 20 pages

In just 2 minutes, automatically find out how much a custom website for your business costs

More than 0+ companies have already calculated their quote

Fale com um consultor

Preencha o formulário e nossa equipe entrará em contato.