LLM SEO: How AI Decides What to Cite

LLM SEO is the work of getting your content cited, referenced, or recommended inside AI answers: ChatGPT, Perplexity, Google's AI Overviews, Gemini, Copilot.

how an ai chooses its sources

Most guides on it read as a checklist: write conversationally, add FAQ schema, structure your content, build authority, keep it fresh.

That list is not wrong. But a list of ten tactics with no mechanism underneath it gives you no way to decide what matters when you can only do three of them.

This post is about the mechanism. If you understand how a model actually selects the sources it cites, the checklist reorganises itself, and one lever moves to the front: freshness.

Not because freshness is a nice-to-have, but because of where citations come from and what AI systems do with the dates they find. This is the "how it works" piece that sits under everything else we publish, so it is worth getting the foundation right.

LLM SEO is optimising to be the source, not the result

Traditional SEO optimises a page to rank, to be one of ten blue links a person chooses between. LLM SEO optimises a page to be the source a model uses when it writes the answer the person reads instead of those links.

The difference is not cosmetic. There is no page two in an AI answer. Your content is either incorporated or it is not, and the reader rarely sees what was passed over.

ranking vs being cited

You will see this idea labelled several ways, and the acronyms are worth untangling because one of them is not quite a synonym. Generative engine optimisation (GEO) and LLMO are used more or less interchangeably with LLM SEO. Answer engine optimisation (AEO) is subtly broader.

AEO is the older discipline of being the extractable direct answer across every answer surface (featured snippets, People Also Ask, voice results, and Google's AI Overviews), and it predates the chatbots, having grown out of Google's answer features. LLM SEO is the narrower, newer slice of that aimed specifically at conversational models like ChatGPT, Perplexity, Gemini, and Claude.

The distinction matters more than it first looks, because those models retrieve sources through a different mechanism than the one Google uses to pick a snippet, and as we will see, that mechanism is exactly what puts freshness at the centre. So this post is about LLM SEO specifically, not AEO in general.

The behaviour underneath it is simple to state: a buyer asks a question inside an AI tool, the tool synthesises an answer from a handful of sources, and the brands named in that answer earn a kind of visibility the ones left out never see.

llm seo vs aeo

The practical question for anyone running a blog is how the model chooses what to pull from. Everything else follows from the answer to that.

Citations come from retrieval, and retrieval is the gate

Here is the part most checklists skip, and it is also where LLM SEO genuinely parts ways from broader AEO.

Optimising for a Google featured snippet is, at bottom, about ranking: the snippet is drawn from a page that already ranks well. LLM citation works differently.

When an AI answers from its training data alone, it generally does not cite anyone, because it has no specific source to point to. Citations appear when the model searches the live web mid-answer, reads what it finds, and links the pages it used.

That retrieval step, usually some form of RAG (retrieval-augmented generation), is where almost all citations are minted.

But that raises an obvious question: what is the search the model runs? For most of the systems named above, it is a conventional search index under the hood. AI Overviews draw on Google's index; ChatGPT and Perplexity lean on search backends and their own crawls.

Which means traditional ranking has not stopped mattering. It is what gets you into the candidate set the model reads from in the first place. If you are nowhere in the index for a query, you are usually nowhere in the retrieval, and a page that is never retrieved can never be cited.

seo vs llm seo

So this is not SEO versus LLM SEO. The two stack. Ranking gets you considered; it earns your place among the handful of pages the model actually pulls. What changes is what happens next. Being retrieved is no longer the same as being chosen, and among retrieved pages, freshness is a large part of what does the choosing. The contest has simply moved one step further down the funnel, from "do I rank" to "once I am in the room, am I the source that gets named."

This single fact reorders everything. To be cited, you first have to be retrieved, surfaced by the search the model runs before it writes. Most major AI tools now run that live search for anything time-sensitive, factual, or specific, which is most commercial queries.

So the real contest is not "is my content good"; it is "does my content get pulled into the small set of pages the model actually reads for this query."

Retrieval is the gate. Everything you do for LLM SEO is, in effect, an attempt to get through it. And the strongest signal operating at that gate is how current your content looks.

llm seo vs aeo

Why freshness is the lever: AI systems have a measurable recency bias

Freshness is not one bullet among ten. It is the signal that most directly governs whether you make it through retrieval, and unlike most of the others, the effect has been measured rather than asserted.

Ahrefs analysed 17 million citations across six AI search platforms and found that AI assistants prefer to cite fresher content than traditional search results do, a recency skew that shows up across platforms, and runs stronger in the AI answers than in the underlying search rankings. That is the pattern at scale.

The mechanism behind it has been isolated in the lab, too. A controlled study on recency bias injected artificial publication dates into otherwise identical passages and watched the rankings move.

Adding nothing but a fresher date shifted the top ten's mean publication year forward by years and moved individual passages by as many as 95 positions. Between two passages of genuinely equal relevance, the model's preference flipped around a quarter of the time on the strength of the date alone.

refreshing content improves rankings

Read those two findings together and the takeaway is hard to dodge. A model deciding what to cite is staking its own credibility on being right now, so a source that reads as current is a safer bet than one that reads as old, even when the two say the same thing.

Recency is not a tiebreaker the systems apply reluctantly; it is baked into how they rank what they retrieve. Which means the date and currency of your content are not housekeeping. They are a primary input to whether you are cited at all.

Freshness is necessary, not sufficient, and that is the good news

It would be dishonest to claim freshness is the only thing that matters, and overclaiming is exactly how you write a post that ages badly. It is not the only thing.

Some citation research finds brand authority to be the single strongest predictor of being cited, and finds that backlinks, the currency of traditional SEO, correlate weakly or not at all with AI visibility.

Structure matters independently too: self-contained, clearly-bounded passages get pulled into answers far more readily than the same facts buried in long unbroken prose, because a model extracts snippets, not whole pages.

So freshness sits alongside authority and structure, not above them. But here is why it still belongs at the front of your list: of the three, it is the one you can move this quarter.

You cannot manufacture seventeen years of domain authority before your next board meeting. You can, this week, find the pages where "this year" silently became "last year," where a cited statistic is three years old, where a screenshot shows an interface that no longer exists, and fix them.

Authority is earned slowly. Freshness is maintained continuously. That makes freshness the highest-leverage place for most teams to start, not because it outranks everything else, but because it is the lever actually within reach.

citation levers for llms

What this means for the content you already have

If freshness governs retrieval, then your existing back catalogue is where LLM SEO is won or lost, not your next post. Every page you have already published is either reading as current to the models that might cite it, or quietly reading as stale.

And staleness here is rarely the obvious kind. A wrong year in the title is easy to spot and easy to fix.

The damaging kind hides in the body: the aged statistic, the renamed tool, the price that changed, the "as of last year" aside nobody flags. Each is a freshness signal pointing the same way, and together they are the difference between being the source a model trusts and being invisible to it.

kinds of content staleness

This is the cost we call freshness debt: the accumulating, compounding price of content you published and then stopped maintaining. It is the bridge between LLM SEO and the older idea of content decay, which used to mean lost rankings and now also means lost citations.

If you want the deeper treatment of what content decay is and how to spot it, that is the companion piece.

But the core point for LLM SEO is this: the single most reliable way to keep getting cited is to keep your existing pages current, continuously, rather than letting them drift and hoping the next new post makes up the difference.

The takeaway

LLM SEO is not a ten-item checklist; it is one mechanism with consequences. Citations come from retrieval, retrieval rewards content that reads as current, and the recency effect is large enough to have been measured across millions of citations and reproduced in the lab.

Authority and structure matter too, but freshness is the lever you can act on continuously and immediately, and it operates directly at the gate everything else has to pass through.

The practical move is to stop thinking of your published content as finished. The pages most likely to be cited next month are the ones you keep current this month.

If you want to see how current your own blog reads to the systems deciding what to cite, you can run a free scan here and get back the pages with stale year references, broken links, and missing internal links in a few minutes, with no signup.