Push-to-Talk vs Always-Listening AI: Privacy

Most voice-AI reviews in 2026 put push-to-talk and always-listening in the same feature column and treat the difference as a matter of taste. It is not. The two models differ at the engineering layer, the failure-mode layer, the power-consumption layer, and the compliance layer. This post explains each of those differences and lays out the narrow set of cases where always-listening is actually the better choice — because there are some.

Two interaction models, one microphone

Both models end at the same place: audio reaches a model that produces a response. They differ on what happens before that.

Always-listening. The microphone is active continuously. A small on-device “keyword spotter” watches the audio stream for a wake word (“Hey Siri,” “Alexa,” custom trigger phrases). When the keyword is detected, a short rolling buffer of audio is sent upstream along with whatever follows the wake word.
Push-to-talk. The microphone is off. On a user gesture — holding a key, pressing a button — the audio device is acquired, a recorder is started, and audio is streamed to the model for as long as the gesture is held. On release, the device is released.

The user experience overlaps. The privacy posture does not. For the full picture of push-to-talk specifically, see our 2026 guide to push-to-talk AI for Chrome.

The engineering distinction

Inside the browser, the two models call different APIs and hold the microphone for very different durations.

An always-listening extension has to:

Call getUserMedia once at install time (or at session start) and hold the audio stream for the entire browsing session.
Run a keyword spotter locally, in a content script or an offscreen document, on every audio frame.
Maintain a rolling buffer of the last few seconds of audio so that when the wake word fires, there is context to send.
Expose a microphone indicator in the browser UI — Chrome shows a recording dot in the tab — that the user has to mentally tune out or they will notice the mic is on the whole time.

A push-to-talk extension, by contrast:

Listens for a keyboard event (Alt key-down, for Clicky).
Creates an offscreen document on key-down, calls getUserMedia, starts MediaRecorder.
Streams audio to the transcription model for the duration of the key-hold.
Tears down the offscreen document on key-up. Audio device is released.

The microphone lifetime in the always-listening model is measured in hours. In push-to-talk, it is measured in seconds. Both are permitted by Chrome’s permission model; only the second is bounded by a user gesture.

Where each model fails

Both models fail. The failures are different.

Always-listening: false triggers

Keyword spotters are probabilistic. A 2022 study by researchers at Northeastern University and Imperial College London tested eleven smart speakers on 134 hours of TV dialogue and measured routine false activations — 19 per device per day on average. Ten percent of those activations captured more than ten seconds of audio. The study was on speakers in a living room, but the keyword-spotting math is the same for any always-listening system. A browser assistant with a wake word misfires for the same reason Alexa does.

The failure is not that the vendor is lying about the trigger — it is that the trigger is not reliable. A privacy model that depends on a probabilistic component is a privacy model with a known failure rate.

Push-to-talk: intentional capture only

Push-to-talk fails differently. The microphone is physically off unless the user pressed something, so the “false trigger” problem does not apply. The real failure modes are cognitive: the user thought they were recording and were not (key-up too soon), or started recording accidentally (Alt held during a keyboard shortcut). Both are user-facing bugs, both are fixable in the interaction design. Neither is a background data-flow concern.

Battery, CPU, and bandwidth

An always-listening extension runs a keyword spotter full-time. On a laptop on battery, that is measurable power consumption. Even efficient spotters consume a non-trivial CPU budget at the audio sample rate, and the offscreen document or content script hosting the spotter keeps Chrome’s service worker alive continuously. Users who notice shorter battery life after installing an AI extension are usually seeing this.

A push-to-talk extension uses the microphone only during a key-hold. Typical usage — a few seconds per invocation, a few invocations per hour — registers as essentially no CPU and essentially no battery impact. The offscreen document does not exist when the mic is off.

Bandwidth follows a similar pattern. Always-listening systems do not always stream the full audio to the cloud, but their local-to-remote boundary is continuously active. Push-to-talk systems go quiet the moment you release the key.

Data residency and compliance

For users in regulated contexts, the two models have very different compliance profiles.

Medical, legal, financial workflows. Audio captured during a consultation, even incidentally, often triggers retention and logging requirements. An always-listening extension sitting in a Chrome tab during a consultation is a regulatory exposure. A push-to-talk extension that only captures when explicitly invoked is easier to scope.
Open-plan offices and shared spaces. The presence of a second party speaking in the background during a long microphone capture can trigger two-party consent rules in some jurisdictions. Push-to-talk, by bounding capture to user-intentional moments, lowers the probability of inadvertent recording.
GDPR data-minimisation. Article 5(1)(c) requires that processing be limited to what is necessary. Continuous microphone capture is hard to justify under that principle if a push-to-talk alternative exists. For Europe-based users and EU-deployed enterprises, the default should lean toward push-to-talk.

When always-listening is genuinely the right choice

There are cases where always-listening is not just acceptable but the correct design.

Severe motor impairments. A user who cannot reliably hold a modifier key needs voice activation without a physical gesture. Wake-word systems in this context are assistive technology — the trade-off of continuous listening against the alternative of no access is straightforwardly favourable.
Hands-occupied contexts. A surgeon, a mechanic, a cook — anyone whose hands are legitimately busy while they need to query the page. Push-to-talk requires a free hand or an assistive switch; always-listening does not.
Dedicated appliance modes. A browser kiosk in a controlled environment where audio capture is already expected and logged is a different regulatory context from a personal laptop. Always-listening there is consistent with the rest of the setup.

The honest answer is not “push-to-talk always wins.” It is “push-to-talk should be the default, and always-listening should be an opt-in with a clear explanation of what changes.” A voice assistant that lets users toggle between the two is shipping the right product; one that forces one model on every user is making a choice on their behalf.

Why Clicky chose push-to-talk

Clicky defaults to push-to-talk — Alt key held, mic on; Alt key released, mic off. The defaults come from three factual observations about the current browser-AI landscape.

The privacy failure mode of always-listening is non-negotiable for most users in most contexts. If 19 false activations per day is the baseline, we do not want to ship a product that adds those into a browser tab next to your email.
Chrome’s offscreen document API makes push-to-talk enforcement structural, not policy. We create the offscreen document on key-down and tear it down on key-up; there is no code path that keeps the microphone active beyond the key-hold window. The constraint is enforced by Chrome itself.
The accessibility opt-out for always-listening is real, and we plan to support it. The roadmap includes a tap-to-lock trigger (press once to start, press again to stop) and a dedicated accessibility mode with voice activation. Those are additive to the push-to-talk default, not replacements for it. See our accessibility guide for the broader posture.

The result: in its default configuration, Clicky’s microphone is off except during the exact seconds you are holding Alt. That is the strictest available interpretation of browser voice input, and the one we believe should be the 2026 default.

Frequently asked questions

Doesn’t every browser show a microphone indicator when the mic is active?

Yes — Chrome, Firefox, and Safari all surface a recording indicator on the tab or system tray. That is a useful backstop, but it is an indicator, not an enforcement mechanism. The mic is still on during always-listening; the indicator tells you about the state, it does not change the state.

Can an always-listening extension use zero-knowledge keyword spotting so the audio never leaves the device unless the keyword fires?

Many do. But “never leaves the device” still means the device is listening. The keyword spotter is code running on your audio stream with the same false-activation rate as any other. A purely local spotter is better than a streaming one, but not as tight as not listening in the first place.

Is there a setting in Clicky to enable always-listening mode?

Not in the 2026 default build. A tap-to-lock trigger is on the roadmap as a higher-autonomy option, and a dedicated accessibility mode for users who cannot hold a modifier key is planned. A full wake-word always-listening mode is not on the near-term roadmap.

How do I know Clicky’s push-to-talk enforcement actually works?

Chrome DevTools shows the microphone indicator on the tab; it will only appear while you are holding Alt. For a deeper verification, see the post on how AI Chrome extensions see your screen, which walks through how to inspect the extension’s traffic and state in real time.

What about the “tap to talk” pattern — push once, speak, push again?

That is a usability convenience on top of push-to-talk (same privacy profile as long as the mic is off between taps) and a reasonable option for motor reasons. We plan to ship it as an alternative to the key-hold default, not a replacement.

Next in our series: Clicky vs Sider — voice, privacy, pricing, compared honestly with current 2026 feature sets.