Even the most advanced AI systems often give incorrect answers when faced with complex questions because they cannot find the correct information: they want to query “Elon Musk’s electric car article that does not mention him”, but it turns out that he is all about him; I want to know about the “penalty for not holding a general meeting of shareholders of the acquiring company”, but AI cannot link “failure to hold a general meeting leading to termination of agreement” and “penalty clause for termination of agreement”. These dilemmas expose the fatal flaws of current AI retrieval systems in dealing with negative queries, multi-step reasoning, and more. ZeroEntropy, founded by post-00s founder Ghita Houir Alami, has raised $4.2 million in seed funding with an innovative training method drawn from chess’ ELO scoring system, aiming to revolutionize this status quo. This article will provide an in-depth analysis of how the company uses mathematical wisdom to solve AI retrieval problems, as well as the technological breakthroughs and industry impact behind it.
Have you ever noticed that even the most advanced AI systems can still give frustratingly wrong answers when faced with complex problems? The problem is often not with the large language models themselves, but with the fact that they simply cannot find the right information.
Imagine that you ask the AI assistant “which electric vehicle articles don’t mention Elon Musk”, and it recommends those articles that discuss Musk at length. Or you ask “what will be the penalty if the acquirer fails to hold a shareholders’ meeting”, but AI cannot link the two relevant information of “failure to hold a shareholders’ meeting will lead to the termination of the agreement” and “the penalty clause for termination of the agreement”.
These seemingly simple questions expose the fundamental flaws of the current AI retrieval system. Neither semantic nor keyword searches can handle complex multi-step reasoning, negative queries, or problems that require integration of information across documents.
It is against this backdrop that a startup called ZeroEntropy has received a $4.2 million seed round funding that they claim to revolutionize the way AI retrieves data. What interests me even more is that the CEO of this company is a post-00s founder from Morocco.
The real challenge of AI retrieval: not just finding information;
As I delved into the problems that ZeroEntropy was solving, I realized that the complexity of this field is far beyond imagination. Most people think that AI search is all about finding relevant documents in the knowledge base, but in reality this is just the tip of the iceberg. The real challenge is that AI needs to understand the deep meaning of a problem like a human and extract the most relevant information from the chaotic data with precision.
How can product managers do a good job in B-end digitalization?
All walks of life have taken advantage of the ride-hail of digital transformation and achieved the rapid development of the industry. Since B-end products are products that provide services for enterprises, how should enterprises ride the digital ride?
View details >
I found that the current retrieval system has three Achilles heels. The first is the handling of negative semantic queries. When you ask, “Which EV articles don’t contain any references from Elon Musk,” traditional semantic and keyword searches instantly retrieve EV articles that specifically discuss Musk, completely against the user’s intent. This mistake is not only frustrating but can also lead to completely wrong business decisions.
The second problem is multi-step inference queries. For example, “What is the penalty if the acquiring company fails to hold a shareholders’ meeting?” To answer this question correctly, the system needs to first find the paragraph on the consequences of not holding a shareholder meeting, find that it will lead to the termination of the agreement, and then search for the penalty clause for the termination of the agreement. However, a simple semantic search only returns paragraphs about shareholders’ meetings and various penalty clauses, but does not correctly correlate the two. This lack of capabilities makes AI extremely limited in handling complex business scenarios.
The third challenge is fuzzy filtering queries. When you ask “what are the diagnostic methods for early-stage cancer in papers with sample sizes over 2,000”, the problem is that sample size information usually appears at the beginning of the paper, while specific diagnoses may be deep in the article. Traditional search systems cannot ensure that both conditions are met at the same time, and may eventually recommend papers with good diagnostic methods but substandard sample sizes.
The root of these problems is that most current AI systems rely on basic semantic search, which is like giving a computer a simple keyword matching tool and expecting it to understand complex query intent like a human. The founding team of ZeroEntropy realized that achieving truly intelligent retrieval required more than just better embedded models, but a new training method and evaluation system. Instead of simply improving existing technology, they developed solutions that fundamentally rethink how AI is trained to understand correlations between queries and documents.
ZeroEntropy’s Technological Innovation: Redefining AI Training with Chess ELO Scores
What really struck me was the training methodology used by the ZeroEntropy team. Instead of going the traditional route, they borrowed the ELO scoring system from chess to train their reordering model. The ingenuity of this approach is that it avoids the problem of “false negatives” in traditional training. As I delved into their technical paper, I discovered that this was not just a clever analogy but a fundamental rethinking of the information retrieval training paradigm.
The traditional reordering model training method is like this: the model is given some manually labeled positive samples (query-document pairs, and humans confirm that the document is related to the query), and then randomly selects some documents as negative samples. But there’s a fatal flaw with this approach: as your “hard negative sample mining” technique gets better, your dataset will be filled with false negative samples, i.e., documents that are actually more relevant than the manually labeled positive samples, but are incorrectly labeled as negative. Imagine a scenario where you ask “Who is the 2017 Nobel Prize winner in physics”, and a traditional system might flag a document like “Gravitational waves were first observed by the LIGO gravitational wave detector in September 2015” as a negative sample because it does not directly answer the question. But in fact, this discovery is the reason why he won the 2017 Nobel Prize in Physics, and it is extremely relevant information.
ZeroEntropy’s solution is extremely elegant. They abandoned absolute scoring in favor of pairwise comparisons. Instead of asking “Is this document 7 or 8 points relevant to the query”, ask “Which of these two documents is more relevant to the query?” “This approach greatly reduces noise and improves the consistency of labeling. Even for complex technical issues, humans and AI can achieve 96% agreement when making pairwise comparisons, compared to traditional absolute scoring consistency of only 60-70%. What does this difference mean in practical applications? This means that the quality of the training data has been fundamentally improved, and the model learns really meaningful correlation signals, rather than random scores full of noise.
What’s even more exciting is the application of the ELO scoring system. They treat the documents as chess players, the results of pairwise comparisons as the results of the match, and then use the maximum likelihood estimate to fit the ELO score of each document. The score is not a simple 0 or 1, but a continuous number that reflects the relative importance of the document. A document may be second under a query and still relevant, but only slightly inferior to the number one document. This meticulous distinction allows AI to better understand the hierarchy and relative importance of information.
The mathematical basis of this method is very rigorous. For n documents, they use σ (ei – ej) to represent the probability of document i defeating document j, where σ is the sigmoid function, and ei and ej are the corresponding ELO scores. By optimizing the negative log-likelihood loss function, they were able to calculate the optimal ELO score for the document shortlist under each query. But in the actual implementation, inference about all n² document pairs per query is computationally not computationally feasible. They cleverly used graph theory methods to fit near-optimal ELO scores with only O(n) inferences. Specifically, they selected 4 random loop graphs for each query, with a total of 400 inferences, to achieve the desired effect.
But the story is not over yet. They also address cross-query bias. Imagine that for a query with no relevant results, even the best results are bad, but in the ELO system it gets a high score; And for a query with a large number of relevant results, even a fairly good result will get a low score. To solve this problem, they introduced cross-query comparison, allowing the model to learn to distinguish between “absolute correlation” levels of different queries. This cross-query comparison involves an “apple vs. orange” comparison, which is technically more difficult but can calibrate the scoring scale between different queries. They used specialized prompt engineering techniques that allowed large language models to reach consensus in this complex comparison.
The innovation of this training method is that it translates synthetic pairwise judgments into ELO-based ranking models for the first time. As far as I understand, this is the first scalable pipeline capable of handling such complex transformations. Their technical report shows that this method, combined with reinforcement learning techniques, further optimizes model performance. In the end, they trained a ze-rank-1 model that outperformed similar models from Cohere and Salesforce in both public and private search benchmarks, improving NDCG@10 metrics by 18% in some areas. Even more impressively, their model also outperformed Gemini 2.5 Flash as a reorderer, with a 12% increase in accuracy.
From Morocco to Silicon Valley: Technological breakthroughs from diverse backgrounds
There is another dimension to the story of ZeroEntropy that touches me deeply: the personal experience of founder Ghita Houir Alami. A woman born and raised in Morocco, she left her hometown at the age of 17 to study engineering at the prestigious École Polytechnique de Paris, an elite military and mathematics-focused institution. There, she discovered her love for machine learning.
She came to UC Berkeley two years ago to complete her master’s degree in mathematics, further deepening her interest in building intelligent systems. Before founding ZeroEntropy, she tried to build an AI assistant, which made her deeply aware of the importance of providing the right context and information for large language models, which partly inspired her to create ZeroEntropy.
I think this diverse background is important for technological innovation. Houir Alami’s experience in different cultures and education systems has allowed her to see problems from a unique perspective. Her background in mathematics has allowed her to gain a deep understanding of the mathematical principles of the ELO scoring system, while her engineering training has helped her translate theory into practical technical solutions.
In a field often criticized for its lack of diversity, 25-year-old Houir Alami has become one of the few female CEOs to build deep infrastructure for one of AI’s toughest problems. But she hopes it won’t last long. “There aren’t many women in the field of developing tools or AI infrastructure,” she said. But I want to say to any young woman interested in technical issues: don’t let this situation stop you. If you’re drawn to complex technical problems, don’t let anyone make you feel like you’re incapable of pursuing them. You should try it. ”
She also maintains a connection to her hometown by giving presentations at high schools and universities in Morocco, aiming to inspire more young girls to pursue their studies in STEM fields. This commitment to social responsibility reflects that she not only pays attention to technological breakthroughs, but also cares about how to get more people to participate in technological innovation.
Notably, ZeroEntropy’s team is made up of mathematicians and competitive programmers from the International Mathematical Olympiad, the International Informatics Olympiad, and the International Physics Olympiad. This deep mathematical foundation provides solid theoretical support for their technological innovation. CTO Nicholas Pipitone, who has a strong background in theoretical mathematics and computer science, dropped out of Carnegie Mellon University to pursue entrepreneurship and previously worked as CTO or lead developer at five different startups.
Revolutionizing Assessment Systems: From Reactive Response to Proactive Discovery
When delving into ZeroEntropy’s technology, I found that they not only have breakthroughs in training methods, but also have deep insights into the evaluation system. This reminds me of a frustrating reality: most businesses have no clear idea of how bad their AI search systems really are. They can usually only vaguely perceive system performance through user “like” feedback, but this lagging, subjective evaluation method does not help developers pinpoint the problem at all.
The ZeroEntropy team spent weeks debugging the retrieval pipeline, drilling down into the data points of thousands of real user queries to clearly classify and determine where semantic search would crash and provide false or hallucinatory results. They found that when AI applications have problems, developers often can’t figure out whether it’s user experience issues, large language model hallucinations, retrieval system failures, or the corpus itself lacking the correct information. This diagnostic difficulty leaves most teams resolving issues by manually reviewing queries, a process that is time-consuming, inconsistent, and simply impractical in large-scale applications.
The bigger challenge is to assess the difficulty of building a benchmark. Evaluating large language models requires only input-output pairs, but evaluation of retrieval systems requires queries, snapshots of the entire corpus at a specific point in time, and real labels that label what the correct search results should be, often across multiple documents. Building such a benchmark is extremely difficult, and I think this is one of the reasons why most companies lack a clear understanding of the true performance of their search systems.
ZeroEntropy is building an open-source benchmark creation framework that believes that large language models can and should be used to autonomously define and build benchmarks to calculate deterministic metrics such as recall, precision, average reciprocal rank, etc. This automated assessment capability is important for the entire industry because it allows more developers to truly understand the performance boundaries of their systems and make targeted improvements.
I particularly agree with their awareness of the importance of assessment. “Evaluating retrieval is a critical step in building useful and reliable AI products,” they wrote in the material. But it is difficult to do so. “This frankness reminds me of the common dilemma that many companies encounter in AI transformation: the technology looks advanced, but it doesn’t work well in practice, and it often doesn’t know what the problem is. ZeroEntropy empowers businesses to scientifically measure and improve their AI retrieval systems by providing clear evaluation frameworks and diagnostic tools.
Business model and ecological layout: strategic thinking from tools to infrastructure
When I analyzed ZeroEntropy’s business strategy, I found that they were very cleverly positioned. Unlike search products for corporate employees, such as Glean, ZeroEntropy is strictly positioned as a developer tool. Founder Houir Alami compares her startup to “Supabase of search”, which is a fitting analogy. Just like Supabase automates most of the database management work, ZeroEntropy automates the entire retrieval process of ingestion, indexing, reordering, and evaluation.
The subtlety of this positioning is that it eschews direct competition from incumbent enterprise search solutions and instead becomes the underlying infrastructure provider. Developers can integrate ZeroEntropy’s reordering capabilities in minutes via APIs without having to build agent infrastructure from scratch. As one customer put it, “It gives us faster time to market, full control over AI behavior, and action coverage across products without having to rebuild anything.” It fits perfectly into our existing infrastructure, standardizing the way we design our APIs, allowing us to focus on core product development while ZeroEntropy handles the heavy lifting of AI. ”
In terms of pricing strategy, ZeroEntropy uses a very competitive model: charging $0.025 per million tokens, which is half the price of Cohere’s latest reordering model. At the same time, they released a fully open-source zerank-1-small model (Apache 2.0 license) on Hugging Face, an open strategy that not only facilitates technology promotion but also reflects a commitment to the open-source community. For enterprise users, they offer their services through their partner Baseten with enterprise-level terms.
I noticed that ZeroEntropy’s customer distribution is interesting. They have served more than 10 early-stage companies in vertical, legal, customer support and sales. This cross-industry application scenario is a testament to the universal need for high-quality retrieval. In the test of private data, ze-rank-1 showed significant improvement in real-world application scenarios of different customers. This diverse customer base not only reduces business risks but also provides ZeroEntropy with valuable cross-industry application experience.
From the perspective of the competitive landscape, ZeroEntropy faces competitors such as MongoDB’s VoyageAI and other Y Combinator alumni such as Sid.ai. But I think their technical advantage is obvious. This is also confirmed by investor feedback: Zoe Perret, partner at Initialized Capital, said, “We’ve seen a lot of teams built around RAG, but Ghita and Nicolas’ model goes above and beyond everything we’ve seen. “This recognition from professional investors reflects ZeroEntropy’s leading position at the technical level.
What’s more, ZeroEntropy is building not just a product, but a new standard in the entire search landscape. Their development of the zbench evaluation framework, innovative ELO training methodology, and upcoming open-source benchmark creation tools have the potential to become industry standards. This platform-based thinking reminds me of early cloud computing companies that not only provided services but also defined technical specifications and best practices across the industry. If ZeroEntropy can successfully promote their technology and standards, they will establish a strong technical moat in this critical area of AI retrieval.
Market validation and future outlook: AGI requires more than just better models
ZeroEntropy’s market validation came faster than I expected. Since its launch in January, they have processed nearly 100 million documents and nearly 1 million queries. More than 10 early-stage companies have used ZeroEntropy to build AI agents in healthcare vertical, legal, customer support, and sales. These numbers reflect the urgent need for high-quality retrieval and the actual effectiveness of ZeroEntropy technology.
Their customer stories show impressive improvements. In private data tests, the ze-rank-1 model shows significant performance improvements in different domains. For example, when using BM25 as the first stage of retrieval, zerank-1 increased the NDCG@10 indicator by 28%; When using OpenAI’s text-embedding-small as the first stage of search, the improvement also reached 18%. These are not just theoretical improvements, but substantial improvements that actually affect the operational efficiency of enterprises. What’s more, this improvement is cross-domain, whether it’s finance, STEM, or other professional fields, and significant results can be seen.
The investor reaction is also a testament to the market’s desire for this technology. Zoe Perret, partner at Initialized Capital, said: “We’ve seen a lot of teams built around RAG, but Ghita and Nicolas’ model goes above and beyond anything we’ve ever seen. Retrieval is undoubtedly a key breakthrough on the next frontier of AI, and ZeroEntropy is building it. The $4.2 million seed round was led by Initialized Capital, with participation from Y Combinator, Transpose Platform, 22 Ventures, a16z Scout, and others, as well as well-known angel investors from OpenAI, Hugging Face, Front, and others. Such an investment lineup not only provides financial support, but more importantly, it brings industry resources and strategic guidance.
I particularly agree with the ZeroEntropy team’s understanding of AGI. They believe that artificial general intelligence needs not only better large language models, but also contextual models that can seamlessly retrieve information like human memory. “Artificial general intelligence is about developing AI that can learn new information on the fly like the humans you hire and train,” they wrote in their blog. Although large language models score in the top 10% in almost every STEM subject, we still don’t reach AGI. Why? Because we need better contextual models that can connect to knowledge bases and retrieve information as seamlessly as human memory. ”
This point of view touches me deeply. While current AI systems excel in specific tasks, they are still inadequate in scenarios that require the integration of multiple information sources and complex reasoning. Most AI products—whether question-answering bots or AI agents—rely on retrieval systems to provide relevant context from the knowledge base. But the reality is that the vast majority of these systems rely on underlying semantic or hybrid search methods, which still often fail. These errors lead to inaccurate responses and hallucinations in large language models, frustrating both developers and end-users.
ZeroEntropy is building not just a better search engine, but the infrastructure that AI really needs: not just powerful models, but truly intelligent and stateful systems capable of reasoning based on context, capable of leveraging the full breadth of knowledge they have accumulated over time. Their vision is clear: to provide instant, accurate access to relevant data for both humans and AI agents. In a sense, ZeroEntropy is building the infrastructure that AGI really needs.
From the perspective of technological development trends, I believe that retrieval quality will become a key factor in differentiating the performance of different AI systems. As the capabilities of large language models gradually converge, the real competitive advantage will come from the ability to accurately and quickly obtain and understand relevant information. That’s why ZeroEntropy’s work is so important – they’re not improving model reasoning capabilities, they’re solving the more basic but equally critical problem of information acquisition.
Looking ahead, I believe that the technological direction represented by ZeroEntropy will be key to the development of AI. As AI agents become more sophisticated, the need for precise and fast retrieval will grow exponentially. Those companies that can provide truly intelligent search capabilities will become indispensable infrastructure providers in the AI ecosystem. ZeroEntropy’s $4.2 million funding round is just the beginning, and as more companies realize the decisive role of search quality in the success or failure of AI applications, the field is bound to usher in greater investment and growth opportunities. They are hiring tech-loving engineers, which also shows that they are ready to accelerate team expansion, further enhance their cutting-edge search models, and scale their infrastructure to help more developers focus on building transformative products rather than struggling with retrieval challenges.