The STT server was crashing itself. The receiver fired up to 3 concurrent transcription requests (inflightSTT<3) into a server that can only do one at a time. The 2nd/3rd queued behind a 14-16s inference, hit the 20s client timeout, the socket was destroyed mid-request → BrokenPipeError crashed the worker → every following buffer returned empty until launchd restarted it.
That's "first utterance works, then dark": the first request hits a healthy idle server; the reply it triggers starts a storm of concurrent re-requests that wedge the server, and it never recovers within the call.
xen-call-receiver.js:260 · inflightSTT<3 → <1. The server is single-threaded anyway; concurrent requests only created the timeout→crash chain. The 250ms tick still accumulates the next utterance — nothing lost.xen-stt-server.py:59-74 · swallow BrokenPipeError/ConnectionResetError on both the success and error write paths. A dropped socket can never crash the worker again.xen-call-receiver.js tee · the warm twin logged the same reply 7-14×/turn → you heard it doubled/overlapping. Now speaks only the first clean line per inject (gated on _lastInjectAt); multi-turn replies still speak.omnimind.js /voice-transcript · the web-voice path skipped dedup entirely, so one "Hello hello" injected 4×. Now a strict 1200ms exact-duplicate guard always applies; genuine repeats >1.2s apart still pass.xen-call-receiver.js:125 · max(1500,durMs)+2800 → durMs+800. 4.3s back-to-back keeps the capture gate armed; ~800ms covers real echo without swallowing your next turn.STT_RATE → 8000. The wire is genuinely 8k (all frames 320B/20ms). The 16k header doesn't cause empties — it just mangles words. Fixing it sharpens accuracy.:319 rms>12→40, :274 gain cap 80→20. Stops amplifying the quiet Telnyx line-hiss into junk buffers that whisper returns empty for.ECONNREFUSED suggest ragged restarts — confirm launchd actually supervises the STT process (it was once an orphan).The receiver doesn't receive 3 requests — it manufactures them. It chops your continuous speech into chunks and flushes one to transcription every 250ms when it hears a pause. But each transcription takes 14-16 seconds on this hardware. So while request #1 is still inferring, the tick fires #2 and #3 for your next words. The old inflightSTT<3 allowed all three; the single-threaded server queued them, they hit the 20s timeout, and crashed it.
Deeper root (next optimization): 14-16s/transcription is slow. Serializing stops the crash, but truly snappy voice needs a faster STT path — tiny/int8 whisper, streaming partials, Metal/GPU acceleration, Apple on-device Speech, or a cloud STT for the call leg.
"8k-vs-16k header is the cause of empties." Refuted by a live model test — a genuine-8k buffer transcribes non-empty at both header rates. A wrong header mangles words; it doesn't blank them.
"vad_filter / no_speech_threshold rejects real speech." Refuted — vad_filter=False already; loud buffers empty because the server is crashed, not rejected.
"Permanent one-way dark latch from the echo-gate." Refuted — capture demonstrably recovers between storms. The dark is episodic per crash, not a latch. The echo-gate is an amplifier, not the root.