| Task | Best Approach | Why | Risk |
|---|---|---|---|
| Make a page extractable | Lead with the answer in the first paragraph under each H2 | AI systems lift the first complete unit they find under a matched heading | Burying the answer causes the model to skip the page entirely |
| Align schema with content | Ensure every JSON-LD claim is visible on the rendered page | Trust signals degrade when schema describes invisible content | Schema-content mismatch can deprioritize the page in AI selection |
| Match headings to queries | Rephrase H2/H3 as questions or direct topic statements | AI systems match headings against query intent for section selection | Vague headings like "Overview" match no natural query |
| Option | When to Use | Strength | Limitation |
|---|---|---|---|
| Answer-first structure | Every page targeting AI citation | Highest correlation with selection across all major AI engines | Requires restructuring existing content flow |
| Schema markup alone | Pages already well-structured | Helps AI understand entity type and content category | Does not compensate for poor visible content structure |
| Definition density | Pages with technical terminology | Reduces model inference errors and increases quote accuracy | Adds length that may dilute topical focus |
Ranking vs. selection
Traditional search ranks pages against a query. Answer engines do something different: they retrieve a candidate set of strong pages, then choose which one to quote. Two pages can rank in the same top three results and only one will be cited in the AI answer. The difference is selection.
Selection is the moment an AI system decides "this page contains the answer in a form I can use." A page that ranks high but buries its answer five paragraphs deep loses to a page that ranks slightly lower but states the answer cleanly in its second paragraph.
> Definition — Extractability is the property of a page that lets a language model lift a self-contained answer without inference, paraphrasing across paragraphs, or guessing at structure.
What "extractability" actually means
Extractability is not the same as readability. Readability is for humans. Extractability is structural: the answer to a likely question exists as a complete unit, near the top of the relevant section, and is not split across multiple paragraphs that require the model to stitch them together.
The strongest extractable units share three properties:
- The first sentence states the answer directly.
- Following sentences add context, conditions, or evidence — not the answer itself.
- The unit is preceded by a heading that matches the question being asked.
The five selection signals AI systems use
Across observable behavior in Google AI Overviews, ChatGPT search, and Perplexity, five signals consistently correlate with selection:
- Answer-first structure. The answer appears in the opening sentence of its section, not in a conclusion.
- Heading-question alignment. H2 and H3 headings phrased as questions or direct topic statements.
- Visible-content / schema parity. JSON-LD describes content that is also visible on the page.
- Definition density. Technical terms are defined inline, not assumed.
- Source clarity. A clear author, organization, and updated date that align with the topic.
Common mistakes that cause skipping
| Mistake | Why it hurts selection | |---|---| | Wall-of-text paragraphs | Forces the model to guess where the answer starts and ends. | | Content hidden in tabs or accordions | Some crawlers do not see it; selection probability drops. | | Schema describing content not visible on page | Trust signal degrades; some engines deprioritize the page. | | Vague H2 like "Overview" or "More info" | Heading does not match any natural query. | | Answer hidden after a long intro | The model lifts the intro instead of the answer, or skips the page. |
What to do about it
For each page that should be answer-eligible, audit three things in this order:
- Lead with the answer. Move the direct answer to the first paragraph under each H2.
- Restate the question in the heading. Replace abstract headings with question-shaped or topic-direct ones.
- Align schema with visible content. Every claim in your JSON-LD should be findable on the rendered page.