Karpathy's LLM Wiki: Your Second Brain, Maintained by the Machine

A few months ago Andrej Karpathy dropped a GitHub gist that coined a term I haven’t stopped thinking about: the LLM Wiki. The pitch is simple enough to fit on a napkin. Obsidian is the IDE, the LLM is the programmer, and the wiki is the codebase.

Sit with that for a second. It reframes your second brain as something the model builds and maintains, not something you query. When I walked through the history of the second brain, this is the corner I promised to come back to.

Compile, Don’t Retrieve

The usual move with a big pile of notes is query-time RAG. You ask a question, some vector embeddings go find the closest chunks, and the model stitches together an answer on the spot. It works, but the knowledge never gets organized. You’re re-deriving structure every single time you ask.

Karpathy flips it. Instead of retrieving at query time, the LLM compiles the knowledge ahead of time and keeps it current. The result is a persistent, cross-linked markdown wiki. The model doesn’t get bored, doesn’t skip the boring summary, doesn’t forget to update the index. It just keeps the thing tidy.

Google landed in a similar spot with their Open Knowledge Format (OKF), which formalizes the same idea as a curated markdown bundle with an open spec. So this isn’t one person’s hot take. The pattern is in the water.

The Three Components

Karpathy’s wiki has three parts, and the separation is the whole point.

Raw sources. Your curated collection of source documents. These are read-only. The LLM reads them but never edits them. He recommends a raw/ directory, with subdirectories for non-text files. This is your ground truth, and keeping it untouchable is what keeps the rest honest.

The wiki. A directory of LLM-generated markdown: summaries, entity pages, concept pages, comparisons, overviews. The model owns this entirely. It creates the pages, maintains the cross-references, and enforces consistency. You’re not in here hand-editing.

The schema. A document like CLAUDE.md or AGENTS.md that tells the LLM how the wiki is structured, what the conventions are, and how the workflows run. It evolves as you use the system. Think of it as the contract between you and the machine.

The Three Operations

The classic Second Brain CODE method has four steps. Karpathy’s version lands on three operations, which I appreciate.

Ingest. You add a source. The LLM reads it, writes a summary page, and updates the index. While it’s in there, it can also touch up related entity and concept pages and log the event. One new document ripples outward into everything it connects to.

Query. You ask a question against the wiki. The model finds the relevant pages, reads, and synthesizes an answer as markdown, tables, whatever fits. Here’s the part I like: a useful exploration can be filed as a new page. So the act of asking a good question makes the next answer easier to produce. The knowledge base gets smarter by being used.

Lint. A periodic health check. The model hunts for contradictions, stale claims, orphan pages, broken references, and missing concepts. The gaps. Then it suggests new questions to ask and new sources to chase. It’s the maintenance pass you’d never do yourself.

The Supporting Cast

A few files make the whole thing run:

  • index.md is the content catalog. Every page, a link, a one-line summary, and some metadata, grouped by category. The model reads this first on a query.
  • log.md is an append-only chronological log with greppable entries like ## [DATE] operation | title.
  • qmd is a local markdown search engine (BM25 plus vector plus an LLM re-rank), available as a CLI or MCP server, for when the wiki outgrows a plain index lookup.

And because this lives in Obsidian, you get some nice perks for free. The Web Clipper turns a browser tab into markdown straight into raw/. Graph View is a visual lint, hubs and orphan pages pop right out. Dataview pulls frontmatter into dynamic tables. And Marp spits out slides directly from your wiki pages.

Why This Clicks for Me

The thing I keep coming back to is the read-only raw/ boundary. So much of the anxiety around AI touching your notes is “what if it mangles something I care about.” Splitting sources from the generated wiki means the model can be as aggressive as it wants in the wiki layer, and your source of truth never moves. The worst case is you regenerate a summary page. No harm done.

I haven’t fully committed my own vault to this yet, but the shape is right. A knowledge base that maintains itself, where asking good questions leaves the place better than you found it. I’ll probably keep poking at it.

Sources

  • Andrej Karpathy, LLM Wiki: A Pattern for AI-Maintained Knowledge Bases — the three components (raw/wiki/schema), the three operations (ingest/query/lint), index.md, log.md, qmd, and the Obsidian tooling (Web Clipper, Graph View, Dataview, Marp).
  • Sam McVeety & Amir Hormati, “Introducing the Open Knowledge Format” (Google Cloud Blog, June 12, 2026) — Google’s OKF as a curated markdown bundle with an open spec, explicitly formalizing the same LLM-wiki pattern.
  • Tiago Forte, Building a Second Brain (2022) — the CODE method (Capture, Organize, Distill, Express), the “classic” four-step workflow this post lines up against Karpathy’s three operations.
  • Roger Montti, “Google Cloud Announces The Open Knowledge Format” (Search Engine Journal, June 15, 2026) — independent news coverage of OKF, including the quote linking it back to Karpathy’s LLM Wiki gist.
  • Cecilia Meis, “Google Launches Open Knowledge Format, an AI Standard” (Semrush Blog, June 23, 2026) — news coverage framing OKF as a vendor-neutral markdown spec for AI agent knowledge.

I’d appreciate a follow. You can subscribe with your email below. The emails go out once a week, or you can find me on Mastodon at @[email protected].

AI LLM Second-brain Obsidian Knowledge-management