AI Search: Cloudflare’s new search primitive for your AI agents
AI Search is Cloudflare’s latest bet on solving one of the biggest bottlenecks for anyone building AI agents today: finding the right information, at the right time, without having to build an entire search infrastructure from scratch.
If you have ever tried adding search to an agent, you know how quickly things can get messy. You need a vector index, an indexing pipeline, logic to keep everything up to date, and if you also want keyword search, that is a whole separate index with result fusion on top. Multiply that by the number of agents you have running, and complexity spirals fast.
That is exactly the problem AI Search, previously known as AutoRAG, is here to solve. Cloudflare announced in April 2026 a significant update to the tool, bringing hybrid search, built-in storage per instance, dynamic creation via Worker binding, and support for multiple instances in a single query. In practice, this means you can hand your agent a full-blown search engine with way less configuration, and still get more accurate results than any standalone approach could deliver on its own 🚀
In this article, we dive into what is new with AI Search, break down how hybrid search works under the hood, and see it all in action through a real-world customer support use case.
What changed in AI Search compared to AutoRAG
When Cloudflare first launched AutoRAG, the premise was already compelling: a managed RAG (Retrieval-Augmented Generation) layer that abstracted away most of the complexity behind semantic search for AI agents. But over time, it became clear that developers needed something more flexible, more powerful, and above all, easier to integrate into dynamic workflows. The rebrand to AI Search is not just cosmetic — it comes with a real overhaul of how the tool works internally and how it fits into Cloudflare’s agent ecosystem.
The first major change is built-in storage per instance. Previously, you had to create an R2 bucket, link it to an AI Search instance, and manage the Vectorize index provisioned in your account. Now, every new AI Search instance comes with its own integrated storage and vector index, powered by R2 and Vectorize behind the scenes. You upload files directly through the API, they get indexed automatically, and there is no need to configure external buckets or connect data sources beforehand. This dramatically simplifies the data lifecycle inside your agent and cuts down the number of moving parts you need to manage.
Dynamic creation via Worker binding is another major leap. The new ai_search_namespaces binding lets you create and delete instances at runtime directly from your Worker. This replaces the previous env.AI.autorag() API, which accessed AI Search through the AI binding. Instead of statically pre-configuring instances before your agent runs, you can now spin up AI Search directly in code, opening the door to much more adaptable architectures. You can create one instance per agent, per customer, or per language — no redeploy needed.
And support for multiple instances in a single query rounds this out perfectly: you can pull results from different knowledge bases in one call, something that previously would have required considerable manual orchestration logic. You can also attach metadata to documents and use it to boost rankings at query time.
How hybrid search works under the hood
Hybrid search is, without question, the technical heart of this update. Until now, AI Search only offered vector search. Vector search is excellent at capturing semantic similarity — you ask about post-sale support, and the system finds documents that talk about customer service even if they do not use those exact words. But it can miss specifics. In a query like ERR_CONNECTION_REFUSED timeout, the embedding captures the broad concept of connection failures, but the user is not looking for general networking docs. They want the specific document that mentions that exact error code. Vector search might return troubleshooting results without ever surfacing the page that contains that string.
Keyword search solves exactly that. AI Search now supports BM25, one of the most widely used retrieval scoring functions in the world. BM25 scores documents based on the frequency of query terms, how rare those terms are across the entire corpus, and document length. It rewards matches on specific terms, penalizes common filler words, and normalizes by document size. However, BM25 can miss a page about troubleshooting network connections even if it describes the exact same problem, just with different words. That is where vector search shines — and why you need both.
When hybrid search is enabled, AI Search runs both strategies in parallel, merges the results, and optionally reranks them. The retrieval pipeline is multi-stage, and each stage is configurable 🔍
Tokenizer
The tokenizer controls how your documents are broken into matchable terms at indexing time. The Porter stemmer option reduces words to their root, so running matches run. The Trigram option does character-level substring matching, so conf matches configuration. Use Porter for natural language content like documentation, and Trigram for code where partial matches matter.
Keyword matching mode
This mode controls which documents are candidates for BM25 scoring at query time. AND mode requires all query terms to appear in the document. OR mode includes any document with at least one match.
Result fusion
Fusion controls how vector and keyword results are combined into the final list. Reciprocal Rank Fusion (RRF) merges by ranking position rather than score, avoiding the problem of comparing two incompatible scoring scales. Max Fusion simply takes the highest score.
Optional reranking
Additionally, you can enable a reranking step with a cross-encoder that rescores results by evaluating the query and document together as a pair. This helps catch cases where a result has the right terms but is not actually answering the question.
Each option has a sensible default when omitted, and you have the flexibility to configure what matters whenever you create a new instance.
Relevance boosting: surfacing what actually matters
Retrieval brings back relevant results, but relevance alone is not always enough. In a search for election results, for example, an article from last week and one from three years ago might be equally relevant semantically, but most users probably want the more recent one. Relevance boosting lets you layer business logic on top of retrieval ranking, adjusting positions based on document metadata.
You can boost by timestamp, which is built into every item, or by any custom metadata field you define. This lets more recent, higher-priority, or category-specific documents rise in the ranking without altering the search logic itself. It is an extra layer of intelligence that makes the system much more useful in real-world scenarios.
AI Search in practice: customer support agent
To see where all of this comes together in a concrete way, Cloudflare walked through a detailed example of a support agent that searches two types of knowledge: shared product documentation and per-customer history, like previous issue resolutions. The product documentation is too large to fit in the context window, and each customer’s history grows with every resolved issue. So the agent needs retrieval to find what is relevant.
The architecture works like this: the shared product documentation lives in an R2 bucket connected to a fixed AI Search instance called product-knowledge, created once in the Cloudflare dashboard. This is the knowledge base that all agents can reference.
For each customer’s history, the agent creates an AI Search instance dynamically using the namespace binding. When a customer shows up for the first time, a dedicated instance is created. After each resolved issue, the agent saves a summary of what went wrong and how it was fixed. Over time, this builds a searchable log of past resolutions.
The namespace structure looks roughly like this: inside the support namespace, there is the product-knowledge instance using R2 as a source and shared across all agents, plus individual per-customer instances using managed storage. The agent then uses Cloudflare’s Agents SDK and defines two tools that the language model can call as the conversation evolves.
The first tool is the knowledge base search. When triggered, it queries both the product documentation and the customer history in a single call, using the cross-instance search capability. Results are merged and ranked together. The second tool saves the resolution: after solving an issue, the agent writes a summary that gets immediately indexed and becomes searchable for future conversations. The uploadAndPoll method waits until indexing is complete before returning, ensuring consistency.
The model used in the example is Kimi K2.5 via Workers AI, and it automatically decides when to call each tool based on the conversation context. This means that, over time, the agent gets smarter for each specific customer, because it accumulates context from past resolutions and avoids repeating fixes that already failed 💡
How AI Search instances work now
If you used AI Search before this version, you know the previous setup: create an R2 bucket, link it to an instance, AI Search generates a service API token, and you manage the Vectorize index provisioned in your account. Uploading an object meant writing to R2 and then waiting for a sync job to run before the object was indexed.
New instances work differently. When you call create(), the instance comes with built-in storage and a vector index right out of the box. You upload a file, it gets sent for indexing immediately, and you can track indexing status with a single uploadAndPoll() call. Once complete, the instance is ready to search. There are no external dependencies to connect.
Each instance can also connect to an external data source — either an R2 bucket or a website — and run on a sync schedule. The external source can coexist with the built-in storage. In the support agent example, product-knowledge is fed by an R2 bucket for shared documentation, while per-customer instances use built-in storage for dynamically loaded context.
Namespaces: creating instances at runtime
The ai_search_namespaces binding is the new way to create search instances dynamically. It exposes APIs like create(), delete(), list(), and search() at the namespace level. If you are creating instances dynamically — whether per agent, per customer, or per tenant — this is the binding to use. Legacy bindings continue to work through Workers compatibility dates.
Pricing and limits during the open beta
New instances created from now on get built-in storage and a vector index automatically. These instances are free while AI Search is in open beta, within the established limits. When using a website as a data source, crawling via Browser Run, formerly Browser Rendering, is now a built-in service and will not be charged separately.
On the Workers Free plan, limits include 100 instances per account, 100,000 files per instance, a maximum file size of 4MB, 20,000 queries per month, and 500 pages crawled per day. On the Workers Paid plan, those numbers jump to 5,000 instances per account, 1 million files per instance or 500,000 for hybrid search, a maximum file size of 4MB, unlimited queries, and unlimited crawling.
After the beta, the goal is to offer unified pricing for AI Search as a single service, rather than charging separately for each underlying component. Workers AI and AI Gateway usage will continue to be billed separately. Cloudflare has committed to providing at least 30 days notice and communicating pricing details before any charges begin.
For instances created before this version, they continue to work exactly as they are. R2 buckets, Vectorize indexes, and Browser Run usage remain in your account and are billed as before. Migration details will be shared soon.
What this means for anyone building agents today
Cloudflare’s move with AI Search says a lot about how AI agent development is evolving. The clear trend is that infrastructure pieces that used to require extensive manual configuration — vector indexes, indexing pipelines, result fusion logic, multi-source orchestration — are being absorbed by managed platforms that expose simple, straightforward APIs. This does not mean the complexity has disappeared. It has just been encapsulated in more robust, battle-tested layers, freeing product teams to focus on what truly matters: agent behavior and the quality of the experience it delivers.
For anyone building agents today, the most immediate impact is the reduced time between idea and working agent. With AI Search, you no longer need to provision a separate vector database, configure embeddings, set up an ingestion pipeline, add a keyword search engine, and then write the logic that ties it all together. You declare an instance, point it at your documents, and the tool handles the rest. This is especially valuable for small teams or projects where iteration speed is critical — like products in the validation phase or internal automation projects.
The evolution from AutoRAG to AI Search also signals growing maturity in the generative AI tooling market. More and more, platforms are moving out of experimental mode and delivering production-grade primitives: reliable, scalable, and integrated into the existing ecosystem. Native hybrid search, built-in storage, and dynamic instance creation are not just interesting features on paper — they represent architectural choices that make agents more robust and easier to maintain over time.
Worth noting: search on Cloudflare’s own blog is now powered by AI Search, which serves as a real-world showcase of the tool’s capabilities in a production environment.
AI Search is already available for developers using the Cloudflare Workers AI platform. To get started, just run the command npx wrangler ai-search create my-search and create your first instance. Full documentation is available on the Cloudflare developer site, and new indexing capabilities and data source connectors are expected in upcoming updates.
