The AI infra track reproduces a large financing of $30 million, and how to redefine the future of multimodal data when data processing meets AI

Daft is designed with the concept of “tools adapt to data”, natively supports multimodal data processing, and has the advantages of seamless expansion and deep optimization of AI workflows, trying to solve the core problems of multimodal data processing. This article will delve into Eventual’s solutions, technological innovations, and the industry significance behind them, analyzing the current state and future of multimodal data processing.

Have you ever wondered why the smartest AI engineers waste 80% of their time fixing data infrastructure instead of building AI applications that truly change the world? This seemingly unreasonable phenomenon is exactly the pain experienced by Eventual founders Sammy Sidhu and Jay Chia when they worked in Lyft’s autonomous driving department. Autonomous vehicles generate massive amounts of multimodal data every day – 3D scans, photos, text, audio, but no single tool can understand and process all these different types of data at the same time. Engineers can only piece together various open-source tools, a process that is lengthy and unreliable. It’s even more frustrating that these top talents with PhDs, who should be focusing on building core applications, are forced to devote most of their energy to maintaining the infrastructure.

This phenomenon is not unique to Lyft. With the explosion of generative AI, every company building AI applications needs to process large amounts of text, images, documents, and video data, but the tools they use are still traditional systems designed to handle web clicks and banking transactions. This mismatch is not only inefficient, but also systematically hinders the pace of innovation across the industry. It was this deep pain point experience that led Sidhu and Chia to decide to build a completely new solution, which is Eventual, which has now received $30 million in funding.

How deep is the pain point of multimodal data processing?

Before diving into Eventual’s solution, I want to help you understand the fundamental limitations of existing data processing tools when dealing with multimodal AI workloads. Traditional data processing engines, such as Apache Spark, were originally designed to handle structured data – think of organized tabular data like bank transactions and user clickstreams. These systems excel at processing numbers and text, but when you ask them to process images, videos, or LiDAR scans, it’s as unrealistic as having a car designed for urban roads climb Mount Everest.

When I talk to engineers from some AI companies, I find that they often encounter the dilemma that in order for Spark to process image data, they have to write a lot of adaptation code to convert the image into a format that Spark can understand, and then convert it back. This process is not only cumbersome, but also extremely crispy. A failure rate of 0.1% may be acceptable in a test environment, but when you’re dealing with millions of files in production, this failure rate becomes a disaster. To make matters worse, modern AI workloads also need to run custom models, call external APIs, and handle a wide variety of data types, far beyond the design of traditional data processing engines.

Sidhu mentioned a thought-provoking observation in an interview: “We see all these great PhDs, great minds in the industry developing autonomous vehicles, but they spend about 80% of their time dealing with infrastructure issues rather than building their core applications. “The extent of this misallocation of resources is staggering. Imagine how much impact this would have on the rate of innovation across the industry if a top scientist at a pharmaceutical company needed to spend 80% of their time repairing lab equipment and only 20% of their time on drug development.

This problem became even more serious after the release of ChatGPT. With the popularity of generative AI, more and more companies are starting to use images, documents, and videos in their applications. But they soon discovered that the existing data infrastructure was completely incapable of handling this multimodal data. Sidhu observed: “The outbreak of ChatGPT has made us see a lot of other people starting to build AI applications with different modalities. Then everyone started using things like images, documents, and videos in their apps. This is exactly where we are seeing a dramatic increase in usage. ”

Eventual’s revolutionary solution

Eventual’s core innovation lies in their ability to build a data processing engine specifically designed for multimodal AI workloads from the ground up – Daft. This is not an improvement or adaptation to existing tools, but a completely new architecture that treats the inherent complexity of multimodal data as a feature rather than a flaw. In my opinion, this shift in design philosophy is revolutionary: instead of trying to force complex multimodal data into a framework designed for simple tabular designs, build a system that natively understands and processes various data types.

The power of Daft lies in its Python-native, open-source data processing engine, specifically designed to quickly process data in different modalities, from text to audio and video. Sidhu says their goal is for Daft to have the same transformative impact on unstructured data infrastructure as SQL does on tabular datasets. This contrast is instructive: the advent of SQL allows anyone to query and analyze structured data without needing to delve into the underlying database schema. DAFT wants to bring the same simplicity and power to multimodal data. I deeply understand the importance of this vision, as dealing with multimodal data today is as painful as directly operating a file system in the era without SQL. Each data type requires specialized processing logic, and each project has to reinvent the wheel, which is a waste of time and prone to errors.

From a technical implementation perspective, Daft is built using Rust, which ensures performance comparable to DuckDB and Polars on a single node, while easily scaling to distributed clusters without any code changes. This design allows developers to develop and test code on laptops and then deploy directly to large-scale clusters in production. I think this seamless scalability is especially important for AI workloads, as data volumes typically grow from a few MB in the development phase to petabytes in production. The implementation principle of this scalability is clever: Daft abstracts the computational logic into a task graph that can be executed sequentially on a single machine or in parallel on a cluster, and the developer only needs to change a line of configuration code. This eliminates the most painful part of traditional distributed system development – the huge gap between local development and cluster deployment.

What impressed me even more was Daft’s in-depth optimization of AI workflows. It not only supports traditional analysis operations such as grouping, joining, and aggregation, but also allows developers to write arbitrary Python code as user-defined functions (UDFs). This means you can perform data cleaning, feature extraction, model inference, and results analysis in the same data processing pipeline without having to switch between multiple tools. This all-in-one design solves a problem that has long plagued AI engineers: the fragmentation of the toolchain. Traditionally, you may need to use pandas for data cleaning, Spark for large-scale processing, Ray for distributed training, and different tools for model deployment. The transfer of data between each tool is a potential point of failure, and maintaining such a toolchain requires a lot of operational effort.

Daft’s UDF system in particular deserves an in-depth discussion. It not only supports ordinary Python functions but also designs asynchronous UDFs specifically for AI workloads. In their demonstration case, GPU inference increased throughput by 5-6x by using asynchronous UDFs. The rationale behind this performance improvement is that traditional synchronous inference causes the GPU to be idle while waiting for I/O operations, while asynchronous UDFs can process one request while waiting for others, making full use of the GPU’s computing power. This optimization is crucial for large-scale AI inference, as GPUs are costly, and any idle time means wasted resources. What’s more, Daft plans to further optimize this process, including support for streaming UDFs, so that results can be returned as soon as they are generated, rather than waiting for the entire batch to complete.

In their demo, engineers were able to build an AI recruitment platform in two days, processing 15,000 GitHub repositories, 33 million commits, and reviewing AI code for 250,000 developers — a speed unimaginable with traditional tools. Let’s break down the technical details of this case: they first used the GitHub API to collect 15,000 popular repositories, then clone all repositories and parse git logs to extract commit information. This process can take weeks with traditional tools as you deal with complex issues like API limits, storage management, error recovery, and more. But with Daft, they only need to wrap API calls into UDFs, and the system automatically handles concurrency, retries, and result storage. The next steps of data cleaning and aggregation, including submitting records and deduplication by developer grouping, require only a few lines of code in Daft and may require complex MapReduce jobs in traditional tools. The most impressive was the AI code review phase: they used large language models to evaluate 250,000 developers in just 3 hours. This efficiency is achieved by relying on Daft’s asynchronous UDF support and intelligent batch processing policies.

Eventual’s commercialization strategy is also sensible. They started by building the powerful open-source core Daft, which already handles petabytes of multimodal data in mission-critical workloads from companies like Amazon, CloudKitchens, and Together AI. Now they are building Eventual Cloud on top of this open-source engine – the first production-ready platform built from the ground up for multimodal AI workloads. In this way, they have not only built a strong technical moat but also cultivated an active open-source community, laying a solid foundation for commercializing their products. This open source + cloud service strategy is clever because it allows users to experience the power of Daft through the open source version first, and then naturally upgrade to a cloud service that offers more enterprise-level features. This strategy is particularly effective in the field of data infrastructure, where businesses are very careful when selecting critical infrastructure, and they need to verify the reliability and performance of the technology first.

Why now is the critical time for multimodal data processing

I think the reason why Eventual is favored by investors is not only because they solve a real technical pain point, but also because they have seized a huge market opportunity. According to management consulting firm MarketsandMarkets, the multimodal AI industry will grow at a compound annual growth rate of 35% between 2023 and 2028. This growth rate reflects the urgent need for multimodal AI applications and signifies a significant demand for infrastructure dedicated to multimodal data.

The explosive growth in data volumes provides deeper support for this demand. “Annual data generation has grown 1,000x over the past 20 years, and 90% of the world’s data was generated in the last two years, with the vast majority of data being unstructured, according to IDC,” noted Astasia Myers, general partner at Felicis. “These numbers are shocking, but more important is the meaning behind them: We are in an era of fundamental shifts in the nature of data. In the past, data was mostly digital and text, but now it is more and more image, video, audio, and sensor data.

This shift poses unprecedented challenges to existing data infrastructure. Traditional tools are not only technically difficult to cope with, but also become unsustainable in terms of cost. Imagine how inefficient it would be to use a system designed to process banking transactions if you needed to process millions of hours of video content to train a computer vision model. Not only do you need a lot of adaptation work, but you also face extremely high compute and storage costs, not to mention system reliability issues.

From an investor perspective, Myers said she discovered Eventual through market mapping when she was looking for data infrastructure that could support a growing number of multimodal AI models. Eventual stands out not only because they are a pioneer in this field, but also because the founders have experienced this data processing problem firsthand. This firsthand experience ensures that the solutions they build truly solve practical problems, not theoretical ideas in ivory towers.

I particularly agree with Myers’ observation of macro trends: “Daft adapts to the huge macro trend of building generative AI around text, images, video, and speech. You need a multimodal native data processing engine. This statement points to the key: instead of retrofitting existing tools to adapt to new requirements, build tools that natively support new requirements. This shift in mindset is crucial for the development of technological infrastructure.

Deep thinking behind technological innovation

When studying Eventual’s technical architecture, I found that their innovation is not only reflected in the functional level, but more importantly, in the design philosophy breakthrough. Traditional data processing systems follow the logic of “data adaptation tools” – you need to convert data into a format that the tool can process. Daft, on the other hand, implements the logic of “tools adapt to data” – the system natively understands various data types without forced conversion. The significance of this philosophical shift is far deeper than meets the eye.

The implications of this difference are far-reaching. In traditional systems, whenever you need to process a new type of data, you need to write a lot of pre- and post-processing code. This not only increases development complexity but also introduces potential sources of error. More seriously, this approach is essentially reinventing the wheel for each data type. In Daft’s architecture, the system is designed to be multimodal native from the ground up, and the support for new data types becomes a configuration problem rather than a development problem. I think this shift in design philosophy is similar to the leap from process-oriented to object-oriented programming, which fundamentally changes the way we think and solve problems.

From a performance perspective, the advantages brought by this design are even more obvious. Traditional systems have a lot of serialization and deserialization overhead when dealing with multimodal data, and Daft avoids these unnecessary transformations with native support. In their demo case, it took only 30 minutes to process 15,000 repositories and 33 million commits, a performance improvement not only from the distributed architecture, but also from the native optimization of multimodal data. This performance difference is magnified several times in a mass production environment, meaning that enterprises can do more with fewer resources or process larger scale data in the same amount of time.

I especially appreciate Daft’s support for asynchronous processing. In the demo, engineers increased the throughput of GPU inference by 5-6 times by using asynchronous UDFs. This optimization is crucial for AI workloads, as model inference is often a bottleneck across the pipeline. With native support for asynchronous operations, Daft allows developers to take full advantage of the parallel processing power of modern hardware without having to manage complex concurrent logic themselves. The ingenuity of this design is that it abstracts complex asynchronous programming patterns into simple APIs, allowing data scientists and AI engineers to focus on business logic without needing to be experts in distributed systems.

Let me delve into the technical details of the AI recruitment platform case just mentioned, as it perfectly demonstrates Daft’s technical advantages. The entire project was completed in two days, and the processing process included four stages: data collection, cleaning, AI reasoning, and result presentation. During the data collection phase, they first search for popular repositories through the GitHub API, which is a challenge in itself because the API has strict rate limits. Traditional methods may require writing complex retry mechanisms and queue systems, but in Daft, they only need to wrap API calls into UDFs, and the system automatically handles concurrency control and error recovery. When encountering API limitations, they cleverly turn to directly clone repositories and parse git logs, a flexibility that is difficult to achieve in traditional data processing frameworks because you need to switch between different tools.

The data cleaning phase revealed another interesting technical issue: developers may use multiple email addresses to submit code, leading to the same person being identified as multiple different contributors. This is a typical entity parsing problem that may require complex MapReduce jobs or specialized data cleaning tools in traditional tools. But in Daft, they solve this problem with simple grouping and aggregation operations. This simplicity not only enhances development efficiency but also reduces the likelihood of errors. More importantly, this approach demonstrates Daft’s strength in data quality management – it makes data cleaning a natural continuation of data analysis rather than a separate process.

The AI inference stage is the part that best reflects the advantages of Daft’s technology. They need to conduct code reviews on 250,000 developers, which requires hundreds of thousands of calls to large language models. Traditional methods may require complex task queue systems to handle issues such as failed retries, load balancing, and cost control. But with Daft’s asynchronous UDF, they only need to wrap LLM calls into a function, and the system automatically handles concurrency control and resource optimization. Even more ingeniously, they use Pydantic models to standardize the LLM’s output format, ensuring that the results can be integrated directly into Daft’s data framework. This design demonstrates Daft’s deep thinking in AI workflow integration – it is not only a data processing engine, but also a platform for building AI applications.

In terms of performance optimization, the 5-6x performance improvement of asynchronous UDFs is particularly worthy of in-depth analysis. The principle of this improvement is to take full advantage of the parallelism of GPU and network I/O. In synchronous mode, each inference request needs to wait for the previous one to complete, causing the GPU to be idle while waiting for network I/O. Asynchronous mode allows the system to process other requests while one is waiting, maintaining high GPU utilization. This optimization is valuable in large-scale AI inference, as GPUs are the most expensive resource, and improved utilization directly translates into cost savings. I estimate that in large-scale deployments, this optimization could save businesses millions of dollars in GPU costs.

Even more exciting is Daft’s roadmap. They plan to expand multimodal support for new data types such as video and variants, provide better primitives for AI workloads, including streaming and asynchronous UDFs, continue to invest in data catalogs such as Iceberg and Unity, and build a next-generation distributed execution engine codenamed “Flotilla”. This roadmap shows their clear vision for the future of multimodal data processing, not only addressing current pain points but also preparing for future needs. In particular, the support of streaming UDF will enable real-time AI applications, which is of great significance for scenarios such as autonomous driving and real-time recommendation systems. The new distributed execution engine, “Flotilla,” hints at their continued innovation in the underlying architecture, potentially leading to even greater performance breakthroughs.

Another notable feature of Daft is its deep integration with data catalogs. Modern enterprise data is distributed across various systems, including cloud storage, data lakes, data warehouses, and more. Daft’s support for data directories such as Iceberg and Unity means it can seamlessly access the company’s existing data assets without the need for data migration. This capability is crucial for businesses to adopt new tools, as data migration is often the biggest hurdle. By removing this barrier, Daft has significantly lowered the barrier to adoption for enterprises, which is an important reason why it has been able to quickly acquire large customers such as Amazon and CloudKitchens.

Implications for the entire AI infrastructure industry

The success of Eventual has important implications for the entire AI infrastructure industry. I think we are witnessing a second revolution in AI infrastructure. The first revolution was the shift from general-purpose computing to AI-specific hardware, such as GPUs and TPUs. Now we are experiencing a second revolution: from software architectures designed for structured data to software architectures natively designed for multimodal AI workloads.

This shift has far-reaching implications for the entire tech stack. At the storage level, we need systems that can efficiently store and retrieve various data types. At the computational level, we need engines that can handle multimodal data natively. At the application level, we need frameworks that can seamlessly integrate various AI models and tools. Eventual’s innovations at the computational level point the way for the evolution of the entire stack.

From a business perspective, Eventual’s success also validates the market demand for specialized infrastructure tools. In the past, businesses may have opted for generic solutions and accepted some performance loss, but as the complexity and scale of AI applications continue to grow, specialized tools become indispensable. This creates a huge opportunity for startups focused on specific technology areas and explains why investors are willing to invest heavily in companies like Eventual.

I have observed that more and more AI companies are beginning to realize the importance of data infrastructure. Data infrastructure, which used to be seen as a supporting function, is now becoming a core competitive advantage. Companies that can process multimodal data faster and more reliably have a significant advantage when building AI applications. This shift in perception drives the need for specialized data processing tools and creates a broad market space for companies like Eventual.

From a technical talent perspective, Eventual’s team composition is also very instructive. They bring together developers from projects such as Databricks Photon, GitHub Copilot, Pinecone Vector Database, Render, and AWS PartiQL, which are builders of large-scale systems. This talent allocation shows that building next-generation AI infrastructure requires deep experience in distributed systems and a deep understanding of AI workloads, not just AI algorithm knowledge.

Challenges and future prospects

Despite Eventual’s breakthroughs in multimodal data processing, I think they still face some important challenges. The first is the construction of ecosystems. While Daft is technologically advanced, it also requires a complete toolchain, documentation, training resources, and community support to be adopted by more developers. Most data engineers are familiar with Spark and pandas, and switching to a new tool requires learning costs.

I am also concerned about the standardization of multimodal data processing. Different AI models and applications have different data format requirements, and how to establish some level of standardization while maintaining flexibility will be a long-term challenge. Eventual needs to support a variety of data formats while driving industry best practices and common standards.

From a competitive perspective, Eventual has a clear advantage as a first-mover, but this space is likely to become crowded. Large cloud service providers may launch their own multimodal data processing solutions, and traditional database companies may also increase their investment in this direction. Eventual needs to continue to stay ahead of the curve while rapidly expanding its market share.

Cost optimization is also an important consideration. While Daft has an advantage in performance, multimodal data processing itself is resource-intensive. How to help customers control costs while achieving better performance will be the key to the success of Eventual’s commercialization. They need to provide clear proof of ROI that makes businesses willing to pay for better tools.

Despite these challenges, I am confident in the future of Eventual. They are solving a real and growing problem, with a strong technical team and sufficient financial support. What’s more, as AI applications become more popular, the demand for multimodal data processing will only continue to grow. Eventual is not just building a product, it’s defining a new technology category.

From a broader perspective, I believe Eventual represents the direction of AI infrastructure evolution. We are moving from “adapting AI to existing infrastructure” to “making infrastructure natively support AI”. This shift will unlock the true potential of AI technology, allowing more businesses to build powerful AI applications rather than being constrained by infrastructure limitations. Ultimately, companies like Eventual will be the infrastructure provider for the AI era, just as AWS provides infrastructure for the cloud computing era. Their success will not only drive the growth of their own business, but also accelerate the development of the entire AI industry.

End of text
 0