At the inaugural YC AI Entrepreneurship School, OpenAI Founding Scientist Andrej Karpathy delivered his latest speech, delving into the three revolutions in software in the AI era, the new paradigm of large language models (LLMs), and how to redefine programming and interaction, bringing important inspiration to practitioners and enthusiasts alike.
At the inaugural YC AI Entrepreneurship School held at the Moscone Center in San Francisco, Andrej Karpathy, OpenAI’s founding scientist and former AI director at Tesla, delivered his latest speech.
The following is the full text of Andrej Karpathy’s speech.
01 The three “rebirth” revolutions of software, 3.0 is devouring old code
I’m excited to be here today to discuss software in the age of AI. I learned that many of you are undergraduates, master’s students, doctoral students, etc., who are about to enter the industry. I think it’s a very unique and interesting time to get into the industry right now. Fundamentally, the reason is that software is changing again.
I say “again” because I’ve done this talk before, but the problem is that the software changes all the time. So I have a lot of material to talk about new content, and I think it’s changing fundamentally. Roughly speaking, software hasn’t changed much in 70 years at the basic level.
And in the last few years, it has undergone about two fairly rapid changes. So there’s a lot of work to be done right now, and there’s a lot of software to write and rewrite.
So, let’s take a look at the realm of software. If we think of it as a map of software, here’s a cool tool called “GitHub Maps”.
B-end product manager’s ability model and learning improvement
The first challenge faced by B-end product managers is how to correctly analyze and diagnose business problems. This is also the most difficult part, product design knowledge is basically not helpful for this part of the work, if you want to do a good job in business analysis and diagnosis, you must have a solid …
View details >
It’s like a collection of all the software that’s been written. They are instructions that instruct a computer to perform a task in digital space. Zoom in here and you’ll see a variety of different codebases, all of which are already written code.
A few years ago, I observed that software was changing, with a new type of software emerging. I called it Software 2.0 at the time. The idea is that Software 1.0 is the code you write for your computer, and Software 2.0 is essentially a neural network, especially the weights of a neural network.
Instead of writing this code directly, you create the parameters of the neural network by tuning the dataset and then running the optimizer. At the time, neural networks were more seen as a different classifier, such as a decision tree. So I think this framework description is more appropriate.
Now we have a GitHub-like alternative in the software 2.0 space. I think Hugging Face is basically the GitHub of Software 2.0, and Model Atlas. If you’re curious, visualize all the code written there. By the way, that huge circle, the dot in the middle, is the parameter of the image generator Stable Diffusion.
So whenever someone tweaks LoRA on a Stable Diffusion model, you’re essentially creating a Git commit in this space and creating a new kind of image generator. Therefore, Software 1.0 is the code for programming computers; Software 2.0 is the weight of programming neural networks.
This is an example of an AlexNet image recognition neural network.
Until now, the neural networks we are familiar with are like computers with fixed functions, such as image classification. I think the change is that neural networks become programmable through large language models (LLMs), which is a fundamental change. I think it’s unique.
It is a new type of computer, so it is worth giving it a new name: Software 3.0. Essentially, your prompt is now the program that programs the LLM. It is worth noting that the tips are written in English. So it’s a very interesting programming language.
02 Programming “Human Soul” in English, the prompt word became a new code
Perhaps the difference can be summarized like this: Taking sentiment classification as an example, you can write some Python code to do sentiment classification, or train a neural network, or prompt a large language model. Here’s a short tip you can imagine programming your computer in a slightly different way by changing it.
So, we have Software 1.0, Software 2.0, and I think we see a lot of GitHub code now that isn’t just code, it’s mixed with a lot of English.
So I think it’s a growing new category of code. Not only is this a new programming paradigm, but surprisingly, it is written in our native language, English.
A few years ago, it struck me and I tweeted the idea and it caught a lot of attention. My top tweet at the moment is: It’s amazing that we’re now programming computers in English.
Now, when I was at Tesla, we were developing autonomous driving systems to try to make cars drivable. I was showing a slide at the time and imagined the car’s inputs were at the bottom, and they went through a software stack to generate steering and acceleration signals. I observed a lot of C++ code in the autonomous driving system at the time, which was the code for software 1.0.
Then there are some neural networks for image recognition. I have observed that over time, as we improve our autonomous driving systems, the capabilities and scale of neural networks have grown.
In addition, all C++ code was removed, and many of the features and functions originally written in 1.0 were migrated to 2.0. For example, stitching information across different camera images and across time is now done by neural networks.
We were able to remove a lot of code across the entire autonomous driving software stack. At the time, I thought it was amazing. I think we’re seeing the same situation again now, with a new type of software that is “devouring” the entire software stack.
We have three completely different programming paradigms. I think if you’re going into the industry, it’s a good idea to be proficient in these three paradigms because they each have their pros and cons. You may want to program certain features with 1.0, 2.0, or 3.0, or train a neural network, or prompt the LLM directly. Should this be explicit code? Wait a minute.
We all have to make these decisions and may switch smoothly between these paradigms. So now I want to dive in, in the first part, talk about LLMs, how to think about this new paradigm and its ecosystem, what it looks like. For example, what does this new computer look like? What does an ecosystem look like?
03 We are in the ‘mainframe era’ of LLM – cloud time-sharing systems, and the personal computing revolution has not yet arrived.
Many years ago, I was impressed by a sentence by Andrew Ng, and I think he will speak next, when he said “AI is the new electricity”.
I do think this captures something really interesting, because LLMs these days do have utility qualities. LLM labs such as OpenAI, Gemini, and Anthropic invest capital expenditure to train LLMs, which is similar to building a power grid; Then there are operational expenses to provide intelligent services to all of us through APIs.
This is achieved through metered access, where we pay in millions of tokens or something. We have a lot of utility-like needs for this API: low latency, high availability, consistent quality, etc.
In an electrical system, you will have a transfer switch that switches between power sources such as the grid, solar, batteries, or generators. For LLMs, we have tools like OpenRouter that make it easy to switch between different types of existing LLMs. Because LLMs are software, they don’t compete for physical space. So it’s okay to have 6 power providers, you can switch between them because they don’t compete in this direct way.
I find it very interesting that just the other day we saw a lot of LLMs going down and people were stuck and unable to work. When the most advanced LLMs go down, it’s like a smart “power outage” around the world. It’s like when the grid voltage is unstable, the whole planet becomes “dumber”.
Our reliance on these models is already significant, and I think it will continue to grow. But LLMs not only have utility attributes, but they can also be said to have some fab characteristics, because the capital expenditure required to build LLMs is really huge.
It’s not just about building a certain power station, right? You need to invest a lot of money, and I think the “tech tree” of related technologies is growing rapidly. So we are in a world with a deep “technology tree”, and the secrets of research and development are concentrated inside the LLM lab.
But I think the analogy is a bit vague because, as I said, this is software, and software is less defensible because it is very easy to modify. So it’s just an interesting thing to think about.
04 Reverse revolution in technology proliferation, boiled eggs used AI earlier than missiles
You can do many analogies, such as 4nm process nodes. Perhaps it resembles a cluster with a specific maximum FLOPS. You can think of it: when you use Nvidia GPUs and only do software and not hardware, it’s a bit like fabless mode. But if you’re like Google, building your own hardware and training on TPUs, it’s a bit like the Intel model, having your own fab. So I think there are some plausible analogies here.
But in fact, I think the most appropriate analogy is that LLMs have a very strong analogy with operating systems. Because it’s not just electricity or water. It’s not a commodity that flows out of the tap. They are now an increasingly complex software ecosystem. So they are not just simple commodities like electricity. Interestingly, ecosystems are taking shape in a very similar way.
You have several closed-source providers like Windows or Mac OS, and then you have open-source alternatives like Linux; For LLMs, I think LLaMA is similar. We have several competing closed-source providers, and then the LLaMA ecosystem is probably the closest thing to Linux’s growth path at the moment. Again, it’s still too early because these are just simple LLMs, but we’re starting to see them become more complex.
It’s not just about the LLMs themselves, but also about all the tools used, multimodality, and how it all works. So when I realized this before, I tried to draw it. In my opinion, LLMs are like a new operating system. LLMs are a new type of computer that is somewhat similar to CPUs.
The context window is a bit like memory, and then the LLM uses all of these capabilities to coordinate memory and computation to solve problems. From this point of view, it is indeed very much like an operating system.
Let me give you a few more analogies. For example, if you want to download an app, for example, I go to the official website of VS Code to download it, you can download VS Code and run it on Windows, Linux, or Mac.
Similarly, you can take an LLM application, such as Cursor, and run it on GPT, Claude, or Gemini. Selecting which model is just a drop-down menu. So this aspect is very similar.
Another analogy that struck me was that we were in a time similar to the 1960s, when LLM computations were still very expensive for this new type of computer.
This forces LLMs to be centralized in the cloud. We’re all just thin clients that interact with over the network. None of us can fully utilize these computers. So it makes sense to adopt time-sharing where we are all just one dimension in batch processing when they run computers in the cloud.
This is very much in line with what computers looked like in that era. The operating system is in the cloud. Everything is streamed and there is batch processing. The personal computing revolution has not yet happened because it does not make economic sense.
But I think some people are trying. Mac Minis, for example, has proven to be great for running some LLMs, because if you’re doing batch inference, it’s completely memory-intensive. This is actually doable.
I think these may be some early signs of personal calculations, but this hasn’t really happened yet. It is not yet clear what it will look like. Maybe some of you here will invent it or define how it works and what it should look like.
05 LLM cognitive deficits, can remember things but suffer from short-term amnesia
I want to draw another analogy: whenever I have a conversation with ChatGPT or some LLM directly in text, I feel like I’m talking to the operating system through the terminal, because it’s text, the direct access to the operating system. I don’t think a universal graphical user interface (GUI) has ever been really invented.
For example, should ChatGPT have a different GUI than text bubbles? Of course, some of the applications we’ll talk about later do have GUIs, but there is currently no one that can be used for all tasks. Does this make sense?
LLMs differ from operating systems and early computing in some unique ways. I wrote about this that made me feel that this time it was a very different feature: LLMs are turning the direction of technology proliferation on its head.
For example, when many transformative technologies such as electricity, cryptography, computing, flight, the Internet, GPS, etc., did not exist initially, governments and companies were usually the first users because the technology was novel and expensive, and then spread to consumers. But I think the situation with LLMs is reversed.
Early computers may have been primarily used for ballistics and military purposes, but LLMs are about things like “how to boil an egg.” This is really my main use.
It fascinates me: we have a magical new computer that is cooking eggs for me instead of helping the government do crazy things like missile calculations. In fact, businesses and governments have adopted these technologies after all of us. It’s completely upside down. This may inform how we use this technology and which will be the first to be used.
To sum up so far: using “LLM Lab” and “LLM” is an accurate description, but LLMs are complex operating systems. They are on par in computing as they were in the 1960s, and we are returning to the path of computing. They are currently available through time-sharing and distributed like utilities.
Like never before, they are not in the hands of a few governments and corporations, but in the hands of all of us, because we all have computers, and it’s all just software. ChatGPT seems to have been transmitted to billions of people’s computers overnight. This is crazy. It was incredible to me, and now is the time for us to get into the industry and program these computers, it’s just crazy. I think it’s remarkable.
Before programming LLMs, we need to take the time to think about what they are. I especially like to discuss their “psychology”. I like to think of LLMs as human souls.
They are random simulations of people, and the simulator happens to be an autoregressive Transformer model. A Transformer is a neural network that works at the tokens level, working block by block, with almost equal computation per word block. This simulator essentially involves some weights that we put together on all the data like text on the internet. The result is such a simulator.
Because it is trained on human data, it produces this human-like emergence psychology characteristic. The first thing you’ll notice is that LLMs have encyclopedic knowledge and memory, and they can remember much more than any individual because they read too much.
This reminds me of the movie “Rain Man”, and I really recommend everyone to watch it. It’s a wonderful movie and I like it very much. Dustin Hoffman plays a patient with autism with scholar syndrome who has a near-perfect memory and can remember all names and phone numbers like he would read a phone book.
I think LLMs are very similar. They can remember SHA hashes and all sorts of things very easily. So they do have superpowers in some ways, but they also have a series of cognitive deficits: they often hallucinate, make things up, and lack a good enough internal model of self-perception (at least not enough). There have been improvements in this area, but they are not yet perfect.
They exhibit uneven intelligence: they exhibit superhuman abilities in certain areas of problem solving, but make mistakes that humans hardly do. For example, insisting that 9.11 is greater than 9.9, or that there are two ‘R’s in “strawberry”. These are all famous examples, but there are essentially some “pitfalls” that you may step on. I think it’s also unique. They also suffer from anterograde amnesia.
I would like to point out that if a new colleague comes to your organization, over time he will get to know your organization, understand and get a lot of organizational context. He went home to sleep, consolidated his knowledge, and gradually developed his expertise. But LLMs are not inherently capable of this. I think this has not really been addressed in LLM R&D.
So the context window is actually like working memory. You have to “program” this working memory directly, because they don’t “get smart” by default. I think a lot of people are misled by analogies.
In popular culture, I recommend watching these two films: “Fragments of Memory” and “State 51”. In both films, the “weight” of the protagonists is fixed, and their “context window” is emptied every morning. When this happens, going to work or building a relationship is very problematic. And LLMs have been in this state.
I would like to point out one more security limitation associated with using LLMs. For example, LLMs are quite gullible, they are susceptible to prompt injection risks, may leak your data, etc. There are many other safety-related considerations. In short, you have to…
05 Don’t make robots, make battle suits! Let humans decide how high AI should fly.
At the same time, think about this thing with superpowers but a bunch of cognitive deficits and problems, and they are extremely useful. So how do we program them? How to bypass their flaws and enjoy their superpowers? So I want to talk about opportunities now. How do we use these models? What are the biggest opportunities?
Here are some of the things I found interesting about this talk, not an exhaustive list.
The first thing I’m more excited about is what I call partially autonomous applications. Take coding, for example. You can of course go directly to ChatGPT and start copy-pasting code everywhere, copy-pasting error reports, etc., getting the code and pasting it everywhere. But why would you do that? Why go straight to the operating system? It makes more sense to develop an application specifically for this purpose. I think a lot of you use Cursor, and I do, and Cursor is the kind of tool you need.
You don’t want to go straight to ChatGPT, and I think Cursor is a good example of an early LLM application with some features that I think would be useful for all LLM applications. In particular, you will notice that it has a traditional interface that allows everything to be done manually as before. But beyond that, it now integrates LLMs, allowing us to work with larger chunks of tasks. I think some useful features shared by LLM applications are worth pointing out:
First, LLMs basically do a lot of context management.
Second, they coordinate multiple calls to LLMs. In the case of Cursor, for example, it has an embedding model that serves all files, and a chat model that actually applies differences to the code. It’s all coordinated for you.
I think another very important but probably underappreciated point is the application-specific GUI and its importance, because you don’t want to talk directly to the operating system just through text.
Text is difficult to read, parse, and understand, and you don’t want to do something directly in it. So, it’s much better to see the code change differences directly for the red and green flags and to see what’s new interactively.
It’s much easier to just press Command+Y to accept or Command+N to reject. I shouldn’t have typed in text, right? As a result, the GUI allows humans to audit the work of these error-prone systems and speed them up. I’ll come back to that later.
The last feature I would like to point out is what I call the autonomy slider. For example, in Cursor, you can only do top completion, and you are basically in control; You can select a block of code and press Command+K to modify only that block; You can press Command-L to modify the entire file; Or you can press Command-I and let it play freely throughout the code. This is the kind of fully autonomous proxy model. So you are in control of this autonomy slider.
Depending on the complexity of the task at hand, you can adjust how much autonomy you are willing to give up for that task. Here is another example of a fairly successful LLM application Perplexity, which also has very similar features to what I pointed out in Cursor.
It packages a lot of information, coordinates multiple LLMs, and has a GUI that lets you audit some of its work. For example, it cites sources, you can view them, and it also has an autonomy slider: you can do a quick search, do research, or do an in-depth study and come back after 10 minutes to see the results. It’s just the different degrees of autonomy you give to the tool.
So my problem is that I feel like a lot of software will become partially autonomous. I’m thinking about what this will look like, and for those of you who maintain products and services, how are you going to make your products and services partially autonomous?
Can LLMs see everything humans can see? Can LLMs act in all the ways humans do? Can humans supervise and maintain this cycle of activity? Because again, these systems are not yet perfect and error-prone, so what will the differences look like in PS?
In addition, many traditional software currently has all its switches and other things designed for humans. All of this must change to become accessible to LLMs. So I want to emphasize that for many of these LLM applications, I don’t think this is taken seriously enough: we are now working with AI. Usually they are responsible for generation, and we humans are responsible for verification.
It was in our interest to make this cycle run as fast as possible so that we could get a lot of work done. I think there are two main ways to do this: First, you can speed up validation considerably. I think GUI is extremely important for this because GUI takes advantage of the computer vision “GPU” in our minds, reading text is laborious and uninteresting, but looking at pictures is fun, and visual information is like a highway to the brain. So I think GUI is very useful in terms of auditing systems and visual presentation.
The second point, I think, is that we must restrain AI. I think many people are too enthusiastic about AI agents. For me, receiving a change difference of 1000 lines of code to my repository is useless because I’m still a bottleneck, right? Even if those 1000 lines of code are generated instantly, I have to make sure it doesn’t introduce bugs, does the right thing, has no security issues, etc. So yes, basically we need to make both processes run very, very fast. And we have to try to restrain AI because it becomes too overresponsive. Kind of like this.
This is how I feel when I do AI-assisted coding. If I’m just taking small steps, everything is fine, but if I really want to get the job done, it’s not so good to have an overreacting agent do all of this. This slide isn’t very good, sorry. But I think I, like many of you, are trying to develop some ways to leverage these agents in my coding workflow for AI-assisted coding and my own work.
I was always afraid of receiving too much difference. I always take small incremental blocks. I want to make sure everything is fine. I want to make this cycle turn very, very fast. I will focus on small pieces of work on specific things. So I think many of you may be developing similar ways to use LLMs. I’ve also seen some blog posts trying to develop best practices for using LLMs.
06 Generation-Validation Loop: How to Prevent AI and Humans from Overturning?
This is one I read recently, and I think it’s very good. It discusses some tricks. Some of them involve how to constrain AI. For example, if you’re prompting, if your prompt is large, the AI may not do exactly what you want, so the verification will fail. You will ask for something else. If the validation fails, you have to start the loop. So, taking the time to make your prompt more specific and meaningful will increase the probability of successful verification and keep you going. So I think a lot of us will eventually find this kind of technology.
I think it’s the same in my own work. What I’m currently interested in is what education will look like now that we have AI and all LLMs.
I think a lot of my thinking is focused on how to restrain AI. I don’t think going directly to ChatGPT and saying “Hey, teach me physics” will work. I don’t think it will work because AI will lose its way. So for me, it’s really two separate apps: an app for teachers to create courses, and an app for teachers to get courses and make them available to students. In both cases, we now have an auditable intermediate product – the curriculum.
We can make sure it’s good and consistent. AI is constrained to a specific syllabus and project schedule. So this is a way to restrain AI, I think the probability of success is much higher, and AI will not lose its way.
I would like to draw another analogy: I am no stranger to partial autonomy, I have been working at Tesla for about five years, and this is what I studied. Tesla’s self-driving is also a partially autonomous product with many similar features, such as the self-driving GUI on the dashboard, which shows me what the neural network sees.
We also have autonomy sliders, and during my tenure, we have completed more and more autonomy tasks for users. I want to briefly tell a story: I first drove a self-driving vehicle in 2013.
I had a friend who worked at Waymo at the time, and he invited me to test drive around Palo Alto. I took this photo with Google Glass at the time, and many of you are too young to know what it is. But yes, it was all the rage at the time. We got into the car and drove around Palo Alto for about 30 minutes, walking the highway, the streets, etc.
That drive was flawless with zero intervention. It was 2013, 11 years ago. This impressed me because after that perfect driving demonstration, I felt that autonomous driving was coming because it was already working.
It’s incredible, but 11 years later, we’re still working on autonomous driving. We are still working on driving agents. Even now, we haven’t really solved this problem. You may see Waymo running on the road, it looks like it’s driverless, but there’s still a lot of remote operation, and a lot of driving is still involved. So we haven’t even declared it successful, but I think it will definitely succeed in the end, it just took a long time.
So I think software is really tricky, just like driving is tricky. So when I saw the statement “2025 is the year of agents”, I was very worried.
I kind of think that this will be a “decade of agents” and it will last for quite some time. We need humans in the loop. We need to proceed with caution. This is software, and we have to take it seriously.
Another analogy I often think about is the Iron Man suit. I’ve always loved Iron Man, and I think it portrays technology and its development correctly in many ways. What I like about the Iron Man suit is that it is both an augmentation device and an agent. In some films, the Iron Man suit has a high degree of autonomy and can fly around in search of Tony. This is where the autonomy slider comes in: we can build augmented devices, or we can build agents. We want some of both. But at this stage, I think that when dealing with valuable LLMs, it is more about building Iron Man’s suits rather than Iron Man robots.
07 Atmosphere programming carnival, making programming become ‘human talk’, and the whole world is programmers
It’s not so much about building a dazzling autonomous agent demo as it is about building a partially autonomous product. These products have custom GUIs and UIs, UX, and we do this to make the human generation-verification cycle very, very fast, without losing sight of the fact that this work can be automated in principle.
You should have an autonomy slider in your product, and you should think about how to slide this slider to make your product more autonomous over time. I think there are a lot of opportunities in this type of product.
I want to mention, switch the subject a little bit, and talk about another dimension that I think is very unique. Not only does a new programming language exist that allows autonomy and software, but as I said, it is programmed in English, which is a natural interface. Suddenly, everyone became a programmer because everyone spoke a natural language like English.
It made me feel extremely optimistic and very interesting, and it was completely unprecedented. In the past, it took you 5 to 10 years to study to do something (in the software field). Now the situation is different. I don’t know if anyone has heard of vibe oding.
This tweet introduced the concept somewhat, but I’ve heard that it’s now a big internet meme. Funny story: I’ve been active on Twitter for about 15 years, and it’s still unclear which tweet will go viral and which one will go unnoticed. I thought this tweet would be the latter.
I don’t know if this is just a temporary thought, but it has become a meme on the whole network, and I can’t tell why, but I guess it touched everyone’s heartstrings and gave name to something that people felt but couldn’t say. Now there are even Wikipedia pages and so on. Is this crazy?
Yes, this is now like a major contribution. Tom Wolf of Hugging Face shares this beautiful video that I love so much. These kids are programming the atmosphere.
I think this video is full of positive energy, I like it so much. Seeing such a video, how can you feel pessimistic about the future? The future is bright. In fact, I think this will eventually become an entry into software development. I am not pessimistic about the future of this generation. Yes, I love this video so much.
Because it was so much fun, I also tried atmosphere programming. Atmosphere programming is great when you want to build something hyper-custom, seemingly non-existent, and want to play around at a time like Saturday.
I built this iOS app, I don’t know how to program in Swift, but I was shocked that I could make a super basic app, it was really simple. But I like that it can be done in just one day and run on the phone the same day. It’s amazing that I didn’t have to spend five days learning Swift to get started.
I also built an app called MenuGen through vibe coded, which you can try out at menugen.do. I have a pain point: when looking at the menu in the restaurant, many dishes are not recognized and need picture reference, but such a service does not exist. So I thought: I want atmosphere programming. The operation process is: visit the menugen.do, take a menu photo, generate a picture with MenuGen, and get a $5 credit when you register.
Therefore, this is the main cost center in my life. It’s a negative revenue app at the moment, and I’m losing a lot of money on MenuGen. But the wonderful thing about MenuGen is that the atmosphere programming part is actually very simple. The real difficulty is implementation: adding authentication, payment functions, and domain name configuration, Vercel deployment, etc.
None of this is ambient programming, but I manually click on it in the browser. It was an extremely boring chore, and it took another week. Interestingly, I made a MenuGen demo on my notebook in a few hours, but it took a week to implement a real deployment.
The reason is: it’s so annoying.
08 Reconstructing the world for intelligence, AI has become the “new human” of the Internet
For example, adding Google sign-in to a web page — I know it’s simple — but there are plenty of integration guides like the Clerk library. It’s crazy! It instructed me: visit this URL, click on the drop-down menu, select this, jump to that page, click on that… It’s like teaching me how to do it. The computer is directing me what to do, so you can do it! Why do I do it? What the hell? I have to follow the instructions, it’s ridiculous.
So the last part of my talk focused on: Can we build directly for agents? I don’t want to do these jobs, can the intelligent body do it for me? Thank you?
Okay, in short: new consumers and operators of digital information have emerged. What used to be only human through graphical interfaces or computers through APIs is now something new. Agents are computers, but they are like human souls.
The internet has human souls, and they need to interact with software infrastructure – can we build for it? For example, you can place robots.txt files on your domain name to guide web crawlers on how to behave on your website.
Similarly, you can create a llm.txt or lens.txt file that tells LLMs the topic of the domain in simple Markdown. This is easier for LLMs to read than parsing web page HTML, which is extremely error-prone and difficult. Communicating directly with LLMs is worth it.
Extensive documentation is written for humans, containing elements such as lists, bold, and pictures that LLMs cannot directly understand. I see that service providers such as Vercel and Stripe are adapting documents specifically to LLMs (Markdown format), which is extremely friendly to LLMs. For example, 3Blue1Brown, who makes beautiful animations on YouTube.
Yes, I love this library. He developed Manim, and I wanted to make one myself. Faced with the lengthy Manim documentation, I copied it directly to the LLM, described the requirements, and it directly wrote the animation I wanted. It’s amazing! Making documentation readable to LLMs unlocks great potential and should be promoted.
I also want to emphasize: you can’t stop there. It’s not enough to just switch to Markdown format (which is easy), you also need to modify the content. For example, the “click” command in the document is invalid for LLMs, and Vercel is replacing all “clicks” with curl commands so that the LLM agent can be executed.
It’s funny. And of course, Anthropic’s model card protocol – a protocol that communicates directly with agents who are operators of digital information. I’m very optimistic about this. I also like the various aids that ingest data in an LLM-friendly format.
For example, when visiting nanoGPT’s GitHub repository, you cannot directly ask the LLM (this is a human interface). But change the URL to GitIngested, and it will merge all the files into text and generate a directory structure.
This can be pasted directly into the LLM. More powerful ones like Devon can not only process raw files, but also analyze GitHub repositories and generate exclusive documentation pages, which is very helpful for LLM input. These tools are very useful to make LLMs accessible by modifying URLs and should emerge in large numbers.
It should also be noted that in the future, LLMs will be able to navigate the interface directly (and even now click on it). But proactively optimizing access to information is still valuable because direct operations are expensive and difficult. A lot of software (inactive repositories/infrastructure) will not be adapted, and we need tool support. Other software should go in the middle of the road. In short, it’s great to be in the industry now.
We need to rewrite a lot of code, and professionals will write a lot of code. LLMs are like utilities, like fabs, but especially like operating systems — but in the early stages of the 1960s. They are flawed “human souls” that need to learn to collaborate, and the infrastructure needs to be adjusted for this.
When building LLM applications, I describe the methods and tools for using LLMs efficiently to quickly iterate to create partially autonomous products. Of course, a lot of code needs to be written for the agent. In short, like the metaphor of the Iron Man suit: the slider will move to the right in the next ten years, and I am full of expectations and can’t wait to build the future with you. Thank you!
From stunning demos to large-scale products, technology implementation requires patience. But at this moment, we are at the forefront of a more turbulent tide than the computer revolution of the 1960s – when English becomes a new programming language, when LLM becomes a new operating system, everyone who dares to reshape the digital world with language is the creator of a new paradigm.