Speakable Schema: How to Win Voice and Conversational AI in 2026
Speakable JSON-LD tells AI assistants which parts of your page to read aloud. Used by Google Assistant, Alexa, Siri integrations, and Perplexity Voice. Format and validator checklist.
Speakable schema is JSON-LD markup that points AI engines to the parts of a page best suited for audio playback - typically the headline and the Direct Answer Block. It is used by Google Assistant, Perplexity Voice, and Alexa Skill integrations to read content aloud, and adds a 1.1x citation lift on voice queries.
Key facts
- Voice query share of total AI assistant queries: 18% in April 2026, up from 7% in 2023.
- Speakable schema adoption: 4.2% of news domains, 1.1% of B2B SaaS sites.
- Voice citation lift from Speakable: 1.1x baseline (low but cheap to ship).
- Average voice answer length: 22-28 seconds, ~70-90 words read aloud.
- Top engines using Speakable: Google Assistant, Perplexity Voice, Brave Leo Voice.
What Speakable Is For
Speakable schema (technically SpeakableSpecification) tells AI engines which parts of a page to read aloud when answering a voice query. Google Assistant, Perplexity Voice, Brave Leo Voice, and emerging Alexa Skill integrations all read it. The lift on voice citations is modest (1.1x baseline) but the cost is trivial - five minutes per page.
In April 2026, voice share of total AI assistant queries is 18% (up from 7% in 2023, growing ~30-40% year-over-year). Average voice answer length is 22-28 seconds, or 70-90 words read aloud. Speakable tells the engine which 70-90 words to pick.
A Copy-Paste Template
{
"@context": "https://schema.org",
"@type": "WebPage",
"speakable": {
"@type": "SpeakableSpecification",
"cssSelector": [
".article-headline",
".aeo-direct-answer"
]
}
}
Drop this in a <script type="application/ld+json"> block alongside your Article and FAQPage schema. The CSS selectors point to:
- The H1 (
.article-headlineor whatever class you use) - The Direct Answer Block (
.aeo-direct-answer)
That's it. No additional content needed.
Why "Headline + Direct Answer" Is the Right Selector Set
Voice engines have strict word budgets. They want a self-sufficient excerpt that:
- Names the topic (the headline)
- Answers the implied question (the Direct Answer Block)
- Fits in 22-28 seconds of speech (~70-90 words)
A Direct Answer Block (40-60 words) plus a short headline (~10 words) lands at 50-70 words - comfortably inside the budget. Marking entire articles as Speakable forces the engine to summarize anyway, dropping citation lift to ~1.0x.
Validator Checklist
- CSS selectors point to elements that actually exist on the page.
- Selected text is 50-90 words total (headline + answer block).
- No HTML inside the selected elements (or it gets stripped before reading).
- Validates at validator.schema.org.
- Validates at Google Rich Results Test.
- Page allows GPTBot, Google-Extended, ClaudeBot, PerplexityBot.
Pairing With Direct Answer Blocks
Speakable is the natural pairing for Direct Answer Blocks. The 40-60 word answer you placed under the first H2 is exactly the audio-playback excerpt voice engines want.
If you have already shipped Direct Answer Blocks and given them a CSS class (e.g. .aeo-direct-answer), Speakable is a 30-second add: drop in the JSON-LD pointing to that class, and you are done.
A Micro-Sprint: Add Speakable to 20 Pages in 30 Minutes
- Pick the class. Use
.aeo-direct-answer(or whatever you already use). If your headline doesn't have a stable class, add one. - Add the JSON-LD to your blog template / layout. It applies to every page automatically.
- Validate with Google Rich Results Test on 3-5 sample pages.
- Ship.
If you have a CMS template, this is one template change applied to all pages. If you don't, it's a 30-second copy-paste per page.
What Doesn't Work
- Marking the full article body. Forces the engine to summarize. Drops lift.
- Using XPath. Works, but adds maintenance cost. Stick with CSS selectors.
- Pointing at hidden elements. Engines filter
display: noneandvisibility: hidden. The selector must hit visible content. - Using inline styles instead of class names. Selectors break the moment you refactor styling.
- Forgetting to update. When you change the Direct Answer Block class name, the Speakable selector must update too.
How to Measure
Voice citations are harder to track than text citations because users don't click - they hear the answer and move on. Three approaches:
- Test queries. Run your top 20 informational queries on Google Assistant, Perplexity Voice, Brave Leo Voice once a week. Log who is cited.
- Search Console. Google reports "voice search" impressions in some markets. Check the Performance tab.
- Referrer traffic. A small fraction of voice users follow up by tapping the source - count them in your
assistant.google.comandperplexity.aireferrers.
The Bottom Line
Speakable schema is the cheapest AEO signal in 2026: five minutes per template, 1.1x voice citation lift, no downside. Voice query share is growing 30-40% year-over-year - adding Speakable today gives you a position in a surface that will be 2-3x larger in 18 months. Pair it with Direct Answer Blocks and FAQPage for the full AEO foundation.
Read next: AEO Complete Guide 2026 · Direct Answer Blocks.
Frequently Asked Questions
What sections should I mark as Speakable?
The headline and the Direct Answer Block. AI engines use Speakable selectors to pick the audio-playback excerpt - they want the most concise, self-sufficient summary of the page. Marking entire articles as Speakable (selector body or main) drops citation lift to ~1.0x because the engine has to summarize anyway.
CSS selectors or XPath - which should I use?
CSS selectors. They are simpler, more portable, and work across all current voice engines. Use class names you can rely on (e.g. .aeo-direct-answer, .article-headline). XPath works but adds complexity for marginal gain.
Is Speakable schema worth the effort if voice is only 18% of queries?
It takes 5 minutes per page. The lift is 1.1x baseline on voice queries. ROI per minute invested is high, even though the absolute traffic from voice is still small. The compounding factor is that voice share is growing 30-40% year-over-year.
How does Speakable interact with FAQPage schema?
They serve different surfaces. Speakable points to audio-playback sections; FAQPage marks Q&A pairs for retrieval. Most pages should publish both: Speakable for voice excerpts, FAQPage for Q&A retrieval. They do not conflict.
Does Speakable hurt SEO rankings?
No. Search engines do not penalize Speakable. Google has explicitly listed it as a recommended structured data type for news and informational content. There is no downside to adding it.
Keep reading
FAQPage Schema: The 1.8x Citation Lift for AI Answers
FAQPage JSON-LD is the highest-ROI schema for AI visibility - 1.8x Copilot citation rate, 1.4x Perplexity. Format, copy-paste template, and a validator checklist.
AEO Complete Guide 2026: How to Get Cited by ChatGPT, Perplexity & Google AI Overview
Answer Engine Optimization is the new SEO. A practical 2026 playbook to get your business cited by ChatGPT, Perplexity, Google AI Overview and Copilot - with measurable steps and benchmarks.
What Is llms.txt and Why Every Site Needs One in 2026
llms.txt is the de-facto standard for telling AI engines who you are and how to interpret your content. A complete guide with template, validator checklist, and adoption data.