In the wave of digital transformation, document processing has always been a pain point faced by enterprises. Despite technological advancements, how to efficiently and accurately handle massive amounts of unstructured documents remains a difficult problem for many enterprises to overcome. This article will introduce a startup called Extend that has made impressive achievements in the field of document processing.
Have you ever wondered why PDF document processing is still one of the biggest pain points for enterprises in this era of cloud computing and AI?
Imagine a scenario where a loan application document containing hundreds of pages lies in the banking system, waiting for manual review, while the applicant waits days or even weeks to know the outcome.
At the same time, medical records in the hospital are still being printed out on a printer and then passed on to the next doctor by hand. These seemingly ordinary scenarios actually expose a huge problem facing modern enterprises: 80% of enterprise data is trapped in unstructured documents and cannot be effectively utilized.
As I delved deeper into the market, I discovered an interesting phenomenon: Shortly after I previously analyzed Reducto, a document processing company that received a $24.5 million investment from Benchmark, another company called Extend has also made a breakthrough in this track.
The document processing-focused AI startup has just closed a $17 million seed and Series A round of financing, led by Innovation Endeavors, with participation from well-known investors such as Y Combinator, Homebrew, and Character, as well as heavyweight angel investors such as former Adobe Chief Strategy Officer Scott Belsky and Vercel CEO Guillermo Rauch.
What surprised me even more was that they had achieved annual recurring revenue that exceeded the total amount raised with a team size of only 5 people and less than $2 million in financing, and had positive cash flow. This result gave me an insight into what exactly they were doing and what made Extend unique compared to other players in the market.
Judging from their financing, the $17 million seed round and Series A round were led by Innovation Endeavors, with participation from existing investors such as Y Combinator, Homebrew, and Character.
The participation of these investors itself shows that the market has a high level of recognition of document processing AI technology. What interests me even more is that the founding team of Extend is not a newcomer.
Founders Kushal Byatnal and Rahul Bhattacharya previously co-founded Slintel, a sales intelligence platform that received $25 million in financing, has more than 300 customers and more than 100 employees, and was eventually acquired by 6sense in 2021.
Coupled with more than a decade of AI/ML expertise brought by Chief AI Officer Anirudh Badam from Microsoft’s Seattle headquarters, this entrepreneurial and technical experience gives confidence in their new project.
Why is document processing so difficult?
In my opinion, the reason why the document handling problem has not been well solved is mainly because most people underestimate its complexity.
Many people think that with OCR technology, they can easily extract data from PDFs, but the reality is far more complex than imagined.
I’ve experienced this pain firsthand: in a project where we had to deal with a lot of invoices and receipts, what we thought would be done with a few API calls turned out to be a lot of strange documents in reality—some scanned backwards, some blurry, some covered in coffee stains, and some handwritten tip amounts.
For critical business scenarios that require more than 99% accuracy, such as financial institutions processing loan applications or healthcare institutions processing patient records, even small errors can lead to serious consequences.
B-end product manager’s ability model and learning improvement
The first challenge faced by B-end product managers is how to correctly analyze and diagnose business problems. This is also the most difficult part, product design knowledge is basically not helpful for this part of the work, if you want to do a good job in business analysis and diagnosis, you must have a solid …
View details >
Families are unable to move into their new homes on time because the operations team needs to review more than 100 pages of PDF loan applications; Patients do not receive the care they need when medical records are printed and passed from one provider to another; Small businesses are unable to receive payments on time because invoice details need to be manually reviewed and entered.
Kushal Byatnal’s experience at Brex, the founder of Extend, perfectly illustrates the nature of the problem. They wanted to create a magical experience for users by automatically matching expense reimbursements by parsing receipts.
Sounds simple, right?
But the reality is that there are countless edge situations to consider for user-uploaded receipts.
In addition to millions of different merchant formats, receipts can be upside down, blurry, crumpled, stained with coffee, have handwritten tip amounts, and more.
They tested almost every solution on the market, from traditional OCR vendors to specialized machine learning schemes, and while it looked promising in a handful of examples, the team had enough experience to know that while demos were easy to build and quick to inspire, the real problems awaited them at production scale.
Despite their best efforts, with multiple members of the team painstakingly labeling hundreds of complex examples and edge cases they expect in production, they never achieve a level of accuracy that they can demonstrate to their customers.
As an ambitious company, building modern software from the ground up (they even built the entire card processing infrastructure on their own!). They realized the need for complete control and flexibility over the end-user experience.
Many of their requirements, such as low latency, multilingual support, and custom data fields, simply cannot be met by a “one-size-fits-all” approach to off-the-shelf solutions.
Eventually, they had to build most of the functionality themselves, which quickly became one of their most complex engineering projects, taking months of implementation, iteration, and maintenance.
They used a variety of techniques, combined machine learning models with custom code, and even built an in-house regex rule and heuristic scoring system to build the necessary underlying data architecture. Only with proper regression testing and performance monitoring in place can they begin to build user experiences on top of it.
While they did eventually release that envisioned magical feature and have been one of the main reasons customers give Brex a 10/10 NPS rating for years, the amount of code they had to write and maintain six years later still left Kushal with lingering fears.
This painful experience led them to add new models to keep up with the growing flow of data, adjust heuristics every time an edge situation arises, and deprioritize new feature requests when engineers simply don’t have the bandwidth.
This example perfectly illustrates the real challenge of document processing: while the transformer model greatly increases the lower bounds of document processing possibilities, building production-grade document workflows still requires a significant investment in “document processing infrastructure.”
If you talk to companies that have these mission-critical, in-product documentation use cases, you’ll hear a consistent story: they’ve assigned teams of 5-10 engineers to this problem, worked for over a year, and built a lot of tools around VLM and OCR models.
This includes annotation tools for in-house subject matter experts, human intervention workflows for handling odd document edge cases, reinforcement learning and fine-tuning workflows that learn and adapt to user behavior, evaluation workflows that ensure more than 99% accuracy, workflow orchestration for coordinating complex directed acyclic graphs, combining disparate models including optical character recognition, document segmentation, document extraction, document classification, and more, as well as logic for handling the myriad edge cases in documents – such as handwriting, signatures, strikethroughs, and large tables.
What strikes me deeply is that the base model actually exposes the problem rather than solves it. They give teams the experience of “programming documentation like programming APIs” until they finally realize that the model layer is just the tip of the iceberg and that high-quality documentation is actually a systems engineering problem.
In fact, I find that for many companies—including some well-known portfolio companies—document processing is the biggest bottleneck to their product roadmaps and revenue growth goals. This is exactly why solutions like Extend are so valuable.
Extend Innovations:
I delved into Extend and found that their clever approach lies in recognizing that document processing is actually a systems engineering problem, not just a model problem.
Raw OCR or foundation models don’t completely solve the problem, what you need is a unified infrastructure and toolset that allows cutting-edge teams to handle all the chaotic documentation in one place.
Extend is not just another documentation API, it’s a complete end-to-end platform built on cutting-edge large language models that combine all the other developer primitives needed to reliably handle complex documents.
What impressed me was the full-stack approach to Extend’s approach to this issue.
They provide an advanced parsing engine to handle messy handwriting fonts and tables, assessment tools to ensure reliability, and orchestration capabilities for deploying production-ready pipelines.
This means that the team avoids months of research and development to address edge cases and accuracy gaps, and can quickly convert from raw PDFs to structured, validated, production-ready data – with over 95% accuracy on everything from clean documents to degraded scans.
More importantly, Extend’s product design philosophy reflects a deep understanding of the actual needs of enterprises. They don’t try to be a “black box solution” or restrict software engineers in any way.
In fact, a lot of the value that Extend provides is allowing software engineers to try, test, and evaluate different AI models or processing strategies.
Extend accelerates and enhances your internal build, rather than limiting it, which is an absolute must for serving the most complex companies and teams.
From their customer feedback, this approach really solves the real problem.
Pedro Franceschi, CEO of Brex, said: “Extend performs best in every solution we test – including other vendors, open source and even foundation models. It now powers the critical document workflows of our 30,000 customers, helping us build the smartest and most modern financial platform. “This feedback seems to me to represent the highest recognition of the technical solution by the customer: it not only solves the problem, but also becomes a competitive advantage.
I particularly appreciate two key decisions made by Extend in product design.
First, they focus on serving complex software teams building document-based workflows, rather than back-office labor automation or RPA. Think Brex’s bill payment or Flatiron Health’s electronic health record data ingestion. These workflows are extremely valuable when done right, but they are also the hardest to get right.
Second, Extend allows software engineers to experiment, test, and evaluate different AI models or processing strategies, a flexibility crucial for serving the most complex companies and teams. As Davis Treybig of Innovation Endeavors puts it: “Extend reimagines document intelligence with a full-stack approach, combining cutting-edge LLMs with all the other developer primitives needed to reliably handle complex documents. The product is so powerful that many Extend customers are able to not only automate existing workflows, but also introduce new capabilities that drive competitive differentiation. ”
I particularly note how unique Extend is in terms of empowering engineering and product teams.
Their self-service version platform allows teams to get started right away, which is valuable for businesses that want to quickly validate their use cases or verify accuracy on real-world examples. Sandbox mode provides a fast, frictionless way to experience the full capabilities of Extend, and teams can quickly message the product for support if they need help.
This product experience design reflects their emphasis on developer-friendliness, in contrast to many traditional document processing vendors.
Extend vs Reducto: Two paths to the document processing track
Interestingly, when I compared Extend with Reducto, which I had previously analyzed in depth, I found different strategies for both companies to solve the same problem.
This contrast is enlightening, showcasing the complexity and diverse needs of the document processing market.
Founded by MIT alumni and funded by Benchmark, Reducto’s approach focuses more on creating “magical” analytical accuracy, emphasizing the highest accuracy that can be achieved with a single pass.
Extend, on the other hand, takes a more full-stack and systematic approach, focusing on providing developers with a complete document processing cloud infrastructure.
From the perspective of the technical path, Reducto’s Agent OCR framework adopts a multi-pass approach, automatically reviewing output, catching errors, and correcting them through an agent-based system, similar to the process of a human in a loop.
This approach achieves near-perfect parsing accuracy when working with complex documents.
Extend’s approach focuses more on workflow orchestration and end-to-end production-ready solutions, not only solving parsing problems, but also providing a complete toolchain such as classification, segmentation, validation, and manual review.
From the perspective of customer groups, both companies serve high-end enterprise customers, but the focus is different.
Reducto’s customers include Airtable, Scale AI, and many FAANG companies, who value ultra-high-precision document parsing capabilities. Extend’s customers, such as Brex, Square, Checkr, Flatiron Health, and others, need a complete solution that can be quickly deployed to production.
This reflects two different needs in the market: professional applications that require extreme precision, and enterprise-grade applications that require rapid integration and deployment.
In terms of business model, Reducto focuses on pay-per-page API services and provides cost optimization for simple pages, cutting the cost of processing simple pages in half with zero loss of accuracy. Extend, on the other hand, offers a more diverse service model, including a self-service platform, custom configuration generation, and a complete manual intervention workflow.
This difference reflects the two companies’ different understandings of market positioning: Reducto is more like a “high-precision engine” for document processing, while Extend is more like a “complete operating system for document processing.”
I think both approaches have value and reflect the maturity of the document processing market. For those with technical teams in place and primarily need to solve accuracy issues, Reducto’s high-precision parsing may be a better choice. For those looking to quickly build end-to-end document processing capabilities and reduce engineering efforts, Extend’s full-stack solution is more attractive. From an investor’s perspective, Benchmark is betting on Reducto’s precision advantages, while Innovation Endeavors is investing in Extend’s platformization capabilities, both reflecting the recognition of the different development paths of document processing in this huge market.
A new paradigm for document processing in the AI era
I think we are at an inflection point in the world of document processing. Over the past few years, advancements in large language models have finally made a real impact in this field.
We now live in a world where you can directly call the OpenAI API to classify and extract quite complex documents. These improvements are so significant that many people now consider document processing to be a commodity.
But upon closer inspection, it is actually not that simple.
If you’re just building document search or some kind of RAG-based document system, you may not need more features than the base model offers, as these are use cases where 80-90% accuracy is usually sufficient.
However, this level of accuracy is not sufficient for many of the most valuable documentation use cases.
Imagine you’re uploading payslips to a fintech service that uses the extracted data to approve or reject your loan application – in this case, accuracy and reliability really matter.
Errors are costly, but the value of being able to automate this workflow is enormous, rather than having to wait for a human review and requiring users to wait 24-48 hours for an answer. LLMs can get you started here, but they can’t get you to the reliability you need on their own.
In other words, while the transformer model greatly increases the lower bounds of document processing possibilities, building production-grade document workflows still requires a significant investment in “document processing infrastructure.”
That’s why Extend’s approach is so valuable.
They recognize that the foundation model actually exposes the problem rather than solves it. They give teams the experience of “programming documentation like programming APIs” until they finally realize that the model layer is just the tip of the iceberg and that high-quality documentation is actually a systems engineering problem. As Extend demonstrates in its automated configuration generation capabilities, one of the biggest bottlenecks in document processing is the manual time teams spend tweaking the architecture, crafting prompts, and debugging edge cases to improve accuracy.
Simply upload a few sample documents, and Extend will generate a custom schema optimized for document structure.
Soon, Extend will integrate this experience with the evaluation set and deploy an agent that continuously runs optimization loops in the background, so that your accuracy will improve even when you sleep.
Judging by the new features introduced by Extend, they are moving towards smarter, more autonomous, and more agency – continuously optimizing the accuracy, speed, and reliability of document flows. Their North Star is simple: remove every bottleneck that holds back your team and their unstructured data so they can focus on what makes their business unique (rather than wrestling with PDFs).
The world already has cloud platforms for storage, computing, and collaboration, but until now, no one has built a true document processing cloud – a purpose-built full-stack system to handle the complexity, chaos, and nuance of large-scale, real-world documents.
Think about the impact of this technology on different industries.
- In the real estate industry, Extend is helping businesses move families into new homes faster, automating real estate transactions in all 50 states with a fleet agent.
- In the fintech sector, they enable customers to parse financial documents in real-time through embedded agents, making payments and receiving payments faster.
- In HR and payroll platforms, they enable employees to onboard and get approved for jobs faster by verifying agents for education and employment documents.
- In procurement platforms, they use agents to ingest sales documents to surface data insights and stay ahead of the competition.
- In healthcare, they surface medical insights through agents trained by specialized nurses, driving better patient outcomes.
These use cases demonstrate the wide applicability and great potential of document processing technology.
I was particularly interested in Extend’s workflow and human intervention capabilities, which demonstrated their deep understanding of the complexities of production environments.
In a real production environment, 100% accuracy is not guaranteed. Blurry visuals, blurry data, and model errors can lead to erroneous outputs – and serious downstream consequences.
Extend includes built-in human intervention tools to catch and correct these issues.
You can configure review triggers at any step:
- Confidence threshold (e.g. marked if total_amount < 0.95 confidence)
- Validation failures (e.g. line item totals do not add up), external system checks (e.g. customer ID not found in your database)
- Unexpected document type (e.g., customer uploaded an invalid document type).
Flagged documents are routed to Extend’s built-in review UI, where team members can edit any extracted values, reclassify documents, approve or deny runs, and feed corrections back into the assessment set.
It’s not just a safety net – it’s a tight feedback loop that improves your model over time.
This evolutionary path from human review to full automation is particularly interesting, showing the gradual maturity of AI systems.
In the case of HomeLight, they initially reviewed almost every document. But after a month of near-perfect accuracy and zero corrections, they removed the manual intervention entirely.
This suggests that as models and configurations improve, the need for review drops dramatically, eventually reaching a state of full automation.
When launching a mission-critical use case, the team enables human review to catch early issues and accelerate iteration.
Over time, the demand for review has dropped dramatically as models and configurations have improved.
This progressive approach to automation is more pragmatic and reliable than pursuing full automation in the first place.
The technical depth of the document processing cloud
Taking a closer look at Extend’s technical architecture, I found that their concept of “document processing cloud” is far more complex than it seems.
They build not just an API, but a complete ecosystem including:
- VLM parsing engine to handle complex edge situations – across images, tables, handwriting, signatures, and more;
- LLM context management techniques, such as semantic chunking or table title continuation;
- Data labeling and evaluation tools to measure performance and improve it;
- Orchestrate pipelines for classification, segmentation, and extraction for better accuracy;
- annotation tools for experts in their field;
- Reinforce learning and feedback loops to improve systems with more data;
- and human intervention tools to flag and escalate low-confidence edge situations.
What impressed me was the technical background and experience of the Extend team.
Chief AI Officer Anirudh Badam brings more than a decade of AI/ML expertise from Microsoft’s Seattle headquarters, while founding AI engineer Vijay Sagar spent a decade in Google’s Silicon Valley office, developing machine learning models. This deep technical background allows them to build true full-stack solutions rather than simply encapsulating existing APIs.
As the background of the team members shows, Ishaan briefly held the world record for the Blindfolded Rubik’s Cube and is a top competitive programmer despite being a self-taught software engineer.
Gus created one of the world’s largest benchmarks for tabular data and was one of the founding engineers of AWS SageMaker.
This technical depth allows Extend to solve complex problems that other companies cannot.
From an engineering perspective, the problem solved by Extend is essentially a data and systems engineering problem, and the original OCR or foundation model does not completely solve this problem.
As Kushal Byatnal said, “OCR is dead.” The question is no longer ‘can we extract text from PDF?’ This is the basic requirement.
Instead, the question becomes: ‘How can we effectively teach AI models with PhD-level intelligence the complexity of our documents, business, and workflows so that they can drive business impact?’ This deep understanding of the nature of the problem allows Extend to provide solutions that truly address the underlying problem.
I particularly note Extend’s innovation in handling document complexity.
Documentation is complex, varied, and full of ambiguity. Equipping teams with the right toolset and empowering non-technical domain experts to work with engineers is a way to drive impact quickly, rather than spending months iterating and dealing with edge situations.
AI systems are uncertain and can fail in unexpected ways. Proper safeguards, explainability, and human oversight are necessary for confident deployment to production.
Data complexity, driven by customer demand and growing data flows, only increases over time. A self-improvement system of continuous learning and adaptation is the only way to keep up.
In terms of automated configuration generation, Extend demonstrates a deep understanding of user pain points.
One of the biggest bottlenecks in document processing is the manual time teams spend tweaking the architecture, crafting prompts, and debugging edge cases to improve accuracy.
They released a beta version of the auto-configuration generation to reduce this burden. Simply upload a few sample documents, and Extend will generate a custom schema optimized for document structure.
Soon, Extend will integrate this experience with the evaluation set and deploy an agent that continuously runs optimization loops in the background, so that your accuracy will improve even when you sleep. This concept of continuous optimization reflects the self-evolution ability that software should have in the AI era.
A catalyst for digital transformation of enterprises
In my opinion, Extend represents not only the advancement of document processing technology, but also an important milestone in the digital transformation of enterprises. Unstructured data trapped in documents is the last major frontier of untapped data — and the most painful.
Extend’s mission is to make this data accessible, accurate, and actionable.
When businesses can process this data efficiently, it unlocks immense value. As they say, their mission is simple: to remove every bottleneck that holds back the team and their unstructured data so they can focus on what makes their business unique (rather than wrestling with PDFs).
From a business perspective, Extend’s success also illustrates an important trend: the best infrastructure companies not only solve engineering problems, but also enable their customers to deliver novel product experiences they wouldn’t otherwise be able to build.
As Davis Treybig, Partner at Innovation Endeavors, puts it, “The product is so powerful that many Extend customers are able to not only automate existing workflows, but also introduce new capabilities that drive competitive differentiation. “This feedback seems to me to represent the highest recognition for infrastructure companies: not only solving problems, but also creating new possibilities.
Notably, Extend has achieved amazing commercial results with a very lean team size. They quickly reached multi-million dollars in annual recurring revenue, exceeding the total amount raised in the seed round while also achieving positive cash flow growth.
In working with clients such as Zillow, Flatiron Health, Brex, Opendoor, Square, and more, all of this was achieved with a team size of about 5 people and less than $2 million raised until recently.
In fact, when they did their Series A funding, Extend’s ARR actually exceeded the funding amount! This efficiency reflects their precise grasp of product-market fit and the effectiveness of their technical solutions.
Judging by customer feedback, Extend really solves actual business pain points.
Their clients include Brex, Square, Checkr, Flatiron Health, and several Fortune 500 companies – teams that rely on Extend’s platform to process millions of documents with precision and reliability. As some of the customer feedback I mentioned earlier showed, the feedback they received about Extend was one of the best they received in a Series A company.
A significant number of customers say they see Extend as a great competitive advantage. “These guys are not C or D round companies yet, which confuses me” is a representative feedback. This level of customer satisfaction speaks to the real value of the Extend solution.
I believe that as large language models continue to improve, they will begin to understand the most complex unstructured data in ways that even humans cannot.
Organizations will have hundreds (or thousands) of professional agents looking at every piece of unstructured data, connecting the dots, and surfacing insights we didn’t even know we were looking for. This will revolutionize the way businesses process information, moving from passive data storage to proactive intelligent analytics.
As more companies seek ways to accelerate their work using AI, the Extend team is looking forward to launching a user-friendly interface that anyone can use to automate data processing and pipeline building.
Ultimately, I think companies like Extend are building more than just document processing tools, but infrastructure for enterprise intelligence.
They are creating a world where programming documentation is no different than calling APIs. This shift will allow businesses to make the most of all of their data assets, not just those that are already structured. This will be a huge competitive advantage for those businesses that can be the first to adopt this technology.
As Extend demonstrates, the future is here, and it’s full of possibilities. In this new era of document processing cloud, successful businesses will be those that can turn chaotic PDFs into valuable insights.