Question 1

What is a voice tracking teleprompter?

Accepted Answer

A voice tracking teleprompter listens to what you are saying and matches the script to your speech. You can speed up, slow down, pause, repeat lines, or backtrack — and the text follows you. It is the feature that lets you talk like a human instead of reading like a robot, and the single biggest upgrade over timed-scrolling teleprompters.

Question 2

Why does timed scrolling fail?

Accepted Answer

Timed scrolling rolls text at a fixed speed and forces you to keep up. It works for newsreaders reading polished scripts at a known pace — which is how broadcast teleprompters were designed around 1950 — but not for modern solo creators. You do not actually talk at a fixed speed: you pause for emphasis, slow on complex passages, speed up through familiar phrases, and sometimes redo lines. Timed scrolling fights all of that, so you either race to keep up (and sound rushed) or fall behind (and lose your place). The pace becomes the prompter's, not yours, and audiences hear it as mechanical delivery.

Question 3

What does voice tracking do differently?

Accepted Answer

Voice tracking listens to your speech in real time and matches what you are saying against the script, updating the scroll position based on where the match lands — so the text follows your speech instead of forcing your speech to follow the text. You can pause for emphasis, speed up or slow down, repeat lines, restart sentences, or stop mid-take to sip water or glance at notes. The script stays with you instead of running off without you.

Question 4

What is the difference between cloud and on-device voice tracking?

Accepted Answer

Cloud-based voice tracking sends your microphone audio to a remote server (Google Cloud Speech, AWS Transcribe, or the browser's Web Speech API). Latency is 200–500ms per word, the session times out after about a minute of silence, and it stops working if your internet has a wobble. On-device voice tracking runs locally on your phone or tablet using engines like Apple's SpeechAnalyzer, SFSpeechRecognizer, or VOSK on Android. Latency is 50–100ms (2–10x faster), it works offline, and properly-implemented on-device tracking handles pauses without timing out. For real shoot conditions, on-device is the only version that consistently works.

Question 5

Why is on-device voice tracking harder to build?

Accepted Answer

Cloud speech engines handle the recognition complexity behind an API — the app developer just calls it. On-device engines require the developer to handle the entire pipeline themselves: continuous audio capture, streaming recognition, partial-result handling, silence detection, session resumption after pauses, and the fuzzy matching that maps recognised speech back to the script position. Teleprompter teams that want to ship quickly default to the cloud API. Teams that want the product to work on a real shoot day put in the months of engineering to do it on-device.

Question 6

Why did Steady Cue build its own voice tracking?

Accepted Answer

Steady Cue's voice tracking is built natively, on-device, by working presenters — Andy and Josh, both Fiverr Pro top-rated sellers with 600+ and 1.8k five-star reviews between them, who also coach other spokespeople through a presenting academy. They tried every teleprompter on the market for their own paid client work, and every cloud-based voice tracking implementation failed on real shoots: long pauses dropping the session, internet wobbles breaking tracking mid-take, latency that made delivery feel out of sync. So they built the on-device version themselves.

Voice Tracking Teleprompter

Voice Tracking Teleprompter

Why timed scrolling fails

What voice tracking does differently

Cloud vs on-device voice tracking — the part that matters

Why on-device tracking is harder to build well

What we built and why