Andrej Karpathy recently responded on X to a viewpoint from Anthropic Claude Code team engineer Thariq Shihipar, saying that when asking a large language model a question, simply add a sentence at the end of the prompt—“Please present the answer in HTML structure”—and then view the generated file in the browser. The results are often very good. He even said that he had tried asking an LLM to format the answer as slide form as well, with similarly good results.

(Anthropic engineer: HTML is the best output format for Claude Code, not Markdown)

From plain text to HTML: AI output is shifting from “readable” to “visualizable”

This statement continues the ongoing discussion in the AI developer community in recent days about whether HTML is a better output format for AI than Markdown. Shihipar previously argued in the article “Using Claude Code: The Unreasonable Effectiveness of HTML” that for AI coding agents like Claude Code, HTML is not just a formatting option—it’s an output interface that can upgrade AI answers from linear text into interactive documents.

Karpathy further raised the issue to the evolution of AI input and output interfaces for humans. Karpathy believes that, for most LLMs today, the default output is still stuck at the Markdown stage. Compared with raw text, Markdown has already improved the reading experience through headings, bold, italics, tables, and more, but at its core it’s still linear text presentation.

In his categorization, AI output formats can roughly be seen as an evolutionary path: the first stage is raw text, with the highest reading cost; the second stage is Markdown—the default format for most current AI products; the third stage is HTML. Although HTML is still a programmatic output and the underlying layer requires tags and structure, it enables far more flexible graphics, layouts, and styling, and even the addition of interactive elements.

Markdown helps AI answers feel “easier to read,” but HTML may allow AI answers to become documents that are “browsable, operable, and visually understandable.”

This is also the core reason Shihipar previously argued that HTML beats Markdown: HTML can support SVG charts, color encoding, CSS styling, warning blocks, in-page anchors, interactive components, and side-by-side comparison tables. In scenarios such as technical documentation, vulnerability analysis, data visualization, and instructional explanations, HTML can transform text information that readers would otherwise have to digest slowly into visual documents that make it immediately clear the levels, risks, and relationships.

Karpathy: Humans prefer voice input, but prefer visual output for AI

Karpathy’s new perspective is not just about HTML—it’s about the future of AI interfaces.

He pointed out that, from the input side, humans may prefer voice and AI interaction because speaking is natural and a low-cost form of expression. But from the output side, what humans actually prefer is visual information, including images, animations, and video.

His reason is that the human brain devotes about one-third of its processing capacity to visual information. Therefore, as AI capabilities improve, AI should not only package answers as text, but gradually move toward higher-density, more intuitive visual output.

This makes the importance of HTML more clear. HTML is not the end point; it may be a transition stage for AI moving from text output toward visualized output. It can express images, layouts, and interactivity better than Markdown, yet it’s more stable and controllable than fully generated videos or simulations produced entirely by neural networks.

Karpathy further speculated that although the relevant technology doesn’t exist yet, in the long run, the endpoint of AI output might be some kind of interactive video or simulated content generated directly by diffusion models.

In other words, in the future, AI may not just “answer you a piece of text,” nor “make you an HTML document,” but instead directly generate interactive, explorable, dynamically changing visual scenarios. Users could operate within them, watch changes, and understand causal relationships—like combining instructional videos, interactive simulations, and real-time generation interfaces into one.

However, Karpathy also admitted that there are still many open questions here. In particular, how to combine “Software 1.0” outputs in traditional software engineering—things that are precise, verifiable, and programmatic, such as interactive simulations, frontend components, and mathematical models—with neural-network imagery, animations, or video generated by diffusion models still has no mature answer.

This article Karpathy: AI should not stop at Markdown! HTML is the future; the endgame is an interactive, explorable scenario first appeared on Lianxin ABMedia.

Disclaimer: The information on this page may come from third parties and does not represent the views or opinions of Gate. The content displayed on this page is for reference only and does not constitute any financial, investment, or legal advice. Gate does not guarantee the accuracy or completeness of the information and shall not be liable for any losses arising from the use of this information. Virtual asset investments carry high risks and are subject to significant price volatility. You may lose all of your invested principal. Please fully understand the relevant risks and make prudent decisions based on your own financial situation and risk tolerance. For details, please refer to Disclaimer.