The AI accessibility story is often told as if screen readers were about to be replaced. They are not. The interesting story in 2026 is quieter: AI assistants are becoming a useful third layer on top of the two that have served low-vision web users for decades — operating-system screen readers and browser text-to-speech. This post explains what that third layer is good at, where it falls short, and how to decide whether it belongs in your daily workflow alongside tools that already work.
The 2026 accessibility tooling landscape
Low-vision assistive technology on the web in 2026 spans three layers: operating-system screen readers that linearise the DOM and read it aloud; browser-level text-to-speech that voices selected text; and AI assistants that perceive a specific page and answer questions about it. Each layer solves a different class of problem, and users who rely on assistive technology often run more than one — per WebAIM’s 2024 screen reader survey, 71.6% of respondents use more than one desktop or laptop screen reader.
- Screen readers — the foundation. NVDA (open source), JAWS (commercial), and the built-in macOS / iOS VoiceOver, Windows Narrator, and Android TalkBack. They read every element of a page in order, announce headings and landmarks, interpret form controls, and follow the ARIA spec. In WebAIM’s 2024 survey, JAWS and NVDA were nearly tied as the primary reader (41% and 38%), and JAWS with Chrome was the most common reader-browser pair. These are the tools that actually do the heavy lifting.
- Browser TTS — the shortcut. Built-in reading modes in Edge, Safari, and (via extension) Chrome voice selected text with a system synthesiser. Lower quality than screen readers at navigation, lower quality than cloud TTS at voice fidelity, but trivially available.
- AI assistants — the new layer. Chrome extensions and browser-integrated agents that take a screenshot of the current tab, extract the interactive DOM, and answer a natural-language question about the page with voice output and, in the best case, a visual pointer to the relevant element. Clicky sits here. So does part of what ChatGPT Atlas and Perplexity Comet do, though they pitch broader autonomy. For the category definition and who builds what, see our guide to agentic browser assistants.
Screen readers vs. AI assistants
These tools are not substitutes. Each is strongest at a different class of task.
Where screen readers win decisively
- Linearised reading of the whole page. A screen reader walks the DOM in order and voices every piece of text. AI assistants read a visible-tab screenshot plus a compact interactive-node list; they miss the long-form body content by design.
- Form filling. Screen readers announce labels, required states, error text, and associations between fields. This is the job they were built for.
- Headings and landmark navigation. Jumping to the main region, the next H2, the next link list — standard screen-reader shortcuts that AI assistants have no equivalent for.
- Correctness. A screen reader reads what is in the DOM. An AI assistant’s output can contain hallucinations, especially when a page is visually ambiguous. For anything load-bearing — legal text, medical dosages, pricing — trust the screen reader.
Where AI assistants add something new
- Contextual questions about a page. “What is this page for?” “Summarise the terms above the buttons.” “Which of these plans is the cheapest?” A screen reader cannot answer; it reads everything linearly. AI gives a targeted response in seconds.
- Finding a specific interactive element. “Where is the export button?” on a Salesforce dashboard is hard with a screen reader if the button is not labelled cleanly. An AI assistant with DOM-anchored pointing lands on the correct element directly.
- Ad-hoc summaries of structured content. Tables, long forms, dense comparison pages. AI summarisation is a good shortcut when a linear read-through would be expensive.
The strongest setup for a partially-sighted user in 2026 is both: a screen reader for navigation and correctness, an AI assistant for targeted questions. WebAIM’s annual Million report has tracked accessibility errors across the web’s top million home pages since 2019; even with screen reader improvements, the median page still has dozens of detectable failures. AI tools will not fix those bugs, but they can route around some of them.
Why DOM-anchored pointing matters for partial sight
Not every low-vision user relies on audio alone. A large population of partially-sighted users reads the screen with screen magnifiers, high-contrast themes, or reduced-clutter reader modes. For them, the visual channel still matters — just with more friction than for a fully-sighted user.
This is where the combination of voice answer and visual halo on the relevant DOM element is materially useful. Asking “where is the reply button?” and receiving both an audio answer (“bottom-right, under the conversation”) and a halo drawn around the actual button cuts down the eye-scanning cost of finding it. Where a screen reader would read through the whole thread, and a text-only chat sidebar would tell you where the button is in prose you still have to parse, the halo anchors visual attention to a single coordinate on the page.
The other advantage of DOM-anchored pointing over pixel-based overlays: when the page reflows (zoom level changes, window resize, dynamic content), the halo follows the element because it is addressed by selector, not by pixel coordinate. A visual pointer that drifts when you zoom in is worse than no pointer at all.
Voice output quality in 2026
System TTS has improved — Apple’s Personal Voice, Microsoft Neural Voices, and Google’s WaveNet family are all significantly better than the robot voices of a decade ago — but cloud-hosted models from dedicated voice vendors still have the edge on naturalness, latency consistency, and multilingual range. Clicky defaults to ElevenLabs for its voice output, with browser system voice as a fallback for users who need offline audio or have policy constraints on cloud TTS.
Three things to watch when evaluating voice output for accessibility:
- Latency from question to first syllable. Anything above a second breaks the sense that the tool is responsive. Cloud TTS can be fast or slow depending on the vendor and the region.
- Interruptibility. Can you cut the voice off mid-sentence with a keystroke? Users who are comfortable with the content should not be forced to sit through it.
- Language and accent coverage. Screen readers are strong on major languages; cloud TTS is stronger on long-tail and regional accents. Worth checking for non-English workflows.
What AI accessibility tools still cannot do
The honest part. AI in the browser has real failure modes, and pretending otherwise erodes the trust accessibility tools take years to build.
- Image description is not reliable enough for primary use. A vision model can describe a photo; it can also confidently misdescribe one. For critical visual content, a human describer — via Be My AI (with human fallback via Be My Eyes) or Aira — remains the stronger option in 2026.
- CAPTCHAs. No browser AI assistant solves visual CAPTCHAs, and most make a point of not trying. Use an audio CAPTCHA fallback or contact the site owner.
- Canvas- and WebGL-heavy apps. Figma, Miro, Google Earth — anything drawn to a canvas rather than rendered as DOM is effectively invisible to both screen readers and AI-plus-DOM assistants. Vision-only approaches can help partially, but reliability drops.
- Hallucination risk on ambiguous pages. When the page is visually dense or the question is vague, the model may invent an answer. Users should treat AI output as a hint to verify, not a source of truth.
- The last 5% of navigation. Screen readers handle weird widgets, bespoke date pickers, legacy table layouts, and nested menus better than any generalist AI model.
Where Clicky fits — and what it does not claim
Clicky is a push-to-talk Chrome extension with voice output and a DOM-anchored visual halo. For a partially-sighted or cognitive-load-sensitive user, it is a useful ad-hoc layer on top of a screen reader, not a replacement for one.
What Clicky does well, factually:
- Voice output of model answers at ElevenLabs-grade quality, not system TTS, on any web page. Browser-default voice remains available as a fallback.
- A halo drawn on the actual DOM element you asked about, anchored by selector so it survives reflows, zoom changes, and theme switches.
- Question-first interaction. Hold Alt, ask, release. No wake word, no mouse hunt for a toggle.
- Strict activeTab permission. The extension reads the page only when you explicitly invoke it; it does not scrape pages you visit in the background.
- Runs alongside NVDA, JAWS, and VoiceOver without conflict. Clicky does not inject into the screen-reader object model; the two tools coexist.
What Clicky does not claim, and will not:
- It is not a screen-reader replacement. If you rely on a screen reader for navigation, keep it; add Clicky only as a supplement.
- It does not describe arbitrary images — no OCR, no photo captioning. Use Be My AI or Aira for that.
- It does not solve CAPTCHAs and does not attempt to.
- It is Chrome-only as of 2026. Other Chromium browsers may work unofficially.
- It requires holding a key, which excludes some motor-impaired users from the default mode. A tap-to-lock trigger is on the roadmap.
- It has not been certified to any formal accessibility standard (WCAG, Section 508, EN 301 549). Certifications are on the roadmap; claims without them would be untrue today.
For readers who want to compare before installing, our pricing page lays out what is included on each plan, and the privacy policy spells out exactly what data is captured and when.
Other tools worth knowing in 2026
If you are assembling an accessibility stack, these are worth evaluating alongside Clicky. The list is not exhaustive — it is the shortlist that most regularly comes up in accessibility communities we trust.
- NVDA — free and open source, Windows only. The most actively developed screen reader in 2026.
- JAWS — commercial, Windows only. Still the enterprise default.
- VoiceOver — built into macOS and iOS. Extremely well integrated with native Apple apps, competitive with NVDA / JAWS on the web.
- Be My Eyes — mobile app connecting blind and low-vision users to sighted volunteers, plus Be My AI for image description. Free.
- Aira — professional human visual interpreters on demand. Paid, but covered in many corporate accessibility budgets.
- RNIB (UK) and AFB (US) — non-profits that publish current guidance, technology reviews, and community resources.
The accessibility field moves slowly, deliberately, and with community oversight. Any tool that positions itself as a one-stop replacement for the above has either not shipped yet or is not paying attention. Clicky’s position is narrower and, we hope, more honest: a useful addition for specific tasks, used with the rest of the stack, not instead of it.
Frequently asked questions
Is Clicky a screen reader?
No. Clicky does not linearise the page, announce headings and landmarks, read form labels, or follow ARIA semantics. It reads model-generated answers aloud and draws a halo on a DOM element you asked about. For full-page reading, keep your screen reader.
Will Clicky conflict with NVDA, JAWS, or VoiceOver?
It has not in our testing. Clicky does not inject into the accessibility object model or intercept screen-reader keystrokes. The two tools announce their output independently, which can occasionally result in double-narration; the recommended setup is to pause one while using the other.
Does it work with keyboard-only input?
Yes. The entire Clicky interaction is keyboard-driven — hold Alt, speak, release. This aligns with WCAG 2.2 Success Criterion 2.1.3. If the hold pattern is difficult for motor reasons, a tap-to-lock mode is on the roadmap.
Can it describe images on a page?
Not reliably, and we do not market it as an image-description tool. For images, Be My AI (with volunteer fallback via Be My Eyes) and Aira remain the stronger options.
What about enterprise accessibility compliance?
Clicky has not been certified to WCAG, Section 508, or EN 301 549 as of April 2026. Certifications are on the roadmap. For procurement contexts that require them today, we recommend waiting until they land — and in the interim, using Clicky as a personal supplement rather than a compliance layer.
Next up in our series: how AI Chrome extensions actually see your screen — the technical privacy explainer for anyone wondering what gets sent to the cloud.