Skip to content

Push-to-Talk vs Always-Listening AI: Privacy

Push-to-talk and always-listening voice AI differ on a privacy axis most reviews ignore. Here is the engineering distinction between the two, their failure modes, their battery and compliance implications, and when each is genuinely the right choice.

By Loïc Jané10 min read

Most voice-AI reviews in 2026 put push-to-talk and always-listening in the same feature column and treat the difference as a matter of taste. It is not. The two models differ at the engineering layer, the failure-mode layer, the power-consumption layer, and the compliance layer. This post explains each of those differences and lays out the narrow set of cases where always-listening is actually the better choice — because there are some.

Two interaction models, one microphone

Both models end at the same place: audio reaches a model that produces a response. They differ on what happens before that.

The user experience overlaps. The privacy posture does not. For the full picture of push-to-talk specifically, see our 2026 guide to push-to-talk AI for Chrome.

The engineering distinction

Inside the browser, the two models call different APIs and hold the microphone for very different durations.

An always-listening extension has to:

A push-to-talk extension, by contrast:

The microphone lifetime in the always-listening model is measured in hours. In push-to-talk, it is measured in seconds. Both are permitted by Chrome’s permission model; only the second is bounded by a user gesture.

Where each model fails

Both models fail. The failures are different.

Always-listening: false triggers

Keyword spotters are probabilistic. A 2022 study by researchers at Northeastern University and Imperial College London tested eleven smart speakers on 134 hours of TV dialogue and measured routine false activations — 19 per device per day on average. Ten percent of those activations captured more than ten seconds of audio. The study was on speakers in a living room, but the keyword-spotting math is the same for any always-listening system. A browser assistant with a wake word misfires for the same reason Alexa does.

The failure is not that the vendor is lying about the trigger — it is that the trigger is not reliable. A privacy model that depends on a probabilistic component is a privacy model with a known failure rate.

Push-to-talk: intentional capture only

Push-to-talk fails differently. The microphone is physically off unless the user pressed something, so the “false trigger” problem does not apply. The real failure modes are cognitive: the user thought they were recording and were not (key-up too soon), or started recording accidentally (Alt held during a keyboard shortcut). Both are user-facing bugs, both are fixable in the interaction design. Neither is a background data-flow concern.

Battery, CPU, and bandwidth

An always-listening extension runs a keyword spotter full-time. On a laptop on battery, that is measurable power consumption. Even efficient spotters consume a non-trivial CPU budget at the audio sample rate, and the offscreen document or content script hosting the spotter keeps Chrome’s service worker alive continuously. Users who notice shorter battery life after installing an AI extension are usually seeing this.

A push-to-talk extension uses the microphone only during a key-hold. Typical usage — a few seconds per invocation, a few invocations per hour — registers as essentially no CPU and essentially no battery impact. The offscreen document does not exist when the mic is off.

Bandwidth follows a similar pattern. Always-listening systems do not always stream the full audio to the cloud, but their local-to-remote boundary is continuously active. Push-to-talk systems go quiet the moment you release the key.

Data residency and compliance

For users in regulated contexts, the two models have very different compliance profiles.

When always-listening is genuinely the right choice

There are cases where always-listening is not just acceptable but the correct design.

The honest answer is not “push-to-talk always wins.” It is “push-to-talk should be the default, and always-listening should be an opt-in with a clear explanation of what changes.” A voice assistant that lets users toggle between the two is shipping the right product; one that forces one model on every user is making a choice on their behalf.

Why Clicky chose push-to-talk

Clicky defaults to push-to-talk — Alt key held, mic on; Alt key released, mic off. The defaults come from three factual observations about the current browser-AI landscape.

The result: in its default configuration, Clicky’s microphone is off except during the exact seconds you are holding Alt. That is the strictest available interpretation of browser voice input, and the one we believe should be the 2026 default.

Frequently asked questions

Doesn’t every browser show a microphone indicator when the mic is active?

Yes — Chrome, Firefox, and Safari all surface a recording indicator on the tab or system tray. That is a useful backstop, but it is an indicator, not an enforcement mechanism. The mic is still on during always-listening; the indicator tells you about the state, it does not change the state.

Can an always-listening extension use zero-knowledge keyword spotting so the audio never leaves the device unless the keyword fires?

Many do. But “never leaves the device” still means the device is listening. The keyword spotter is code running on your audio stream with the same false-activation rate as any other. A purely local spotter is better than a streaming one, but not as tight as not listening in the first place.

Is there a setting in Clicky to enable always-listening mode?

Not in the 2026 default build. A tap-to-lock trigger is on the roadmap as a higher-autonomy option, and a dedicated accessibility mode for users who cannot hold a modifier key is planned. A full wake-word always-listening mode is not on the near-term roadmap.

How do I know Clicky’s push-to-talk enforcement actually works?

Chrome DevTools shows the microphone indicator on the tab; it will only appear while you are holding Alt. For a deeper verification, see the post on how AI Chrome extensions see your screen, which walks through how to inspect the extension’s traffic and state in real time.

What about the “tap to talk” pattern — push once, speak, push again?

That is a usability convenience on top of push-to-talk (same privacy profile as long as the mic is off between taps) and a reasonable option for motor reasons. We plan to ship it as an alternative to the key-hold default, not a replacement.

Next in our series: Clicky vs Sider — voice, privacy, pricing, compared honestly with current 2026 feature sets.