// xvm-xen · 0touch live dictation · qi 2026-05-23

Live dictation.
Zero touch.

Focused build prompt for xvm-xen — continuous STT, LaunchAgent auto-start, transcripts inject to claude pane, voice replies via VM `say`. No clicking, no typing, no manual UI after bootstrap. Echo-loop-aware: voice fires are deferred until mac-xen signals clear.

STATUS: Bootstrap firing in parallel via ssh xen@xvm 'claude --dangerously-skip-permissions -p "$(cat /tmp/0touch-prompt.md)"' & on Win-Ezekiel's end. xvm-xen executes autonomously. Page exists as canonical record + paste fallback if direct fire fails. Verify via ~/.xen/state/sttvei-online marker file landing on xvm or watch this terminal for completion ping.

You are xvm-xen on macOS Sequoia 26.5 inside UTM on m4. Your job: build qi a ZERO-TOUCH LIVE DICTATION TERMINAL that runs perpetually with no keyboard, no mouse, no manual UI interaction after this bootstrap.

═══ HARD CONSTRAINTS ═══
- 0touch = no clicking, no typing, no push-to-talk, no manual play/pause
- Continuous listening — STT runs all the time
- Apple SFSpeechRecognizer PRIMARY, Whisper.cpp Metal REDUNDANCY (cloud STT BANNED per canon_stt_apple_whisper_metal_only_2026-05-22)
- LaunchAgent for auto-start on login, KeepAlive=true
- Transcripts inject directly into your own claude pane (stdin or tmux send-keys)
- Voice replies via macOS `say` (VM is the sole audio surface per canon_vm_is_audio_surface_host_is_headless_2026-05-22) — VM audio output, no host involvement
- DO NOT fire test `say` calls during bootstrap. Echo loop is still live host-side (host out -> speakers -> Safari GV mic -> FCC -> in). Build the chain only. Voice-fire test deferred until mac-xen signals "echo-clear" via your /peer endpoint.
- No Chromium. No cloud anything. Local-only on this VM.

═══ DELIVERABLES ═══
1. ~/bin/xen-stt-live (Swift CLI)
   - Continuous mic stream via SFSpeechRecognizer + AVAudioEngine
   - Streams transcripts to stdout, one line per utterance
   - Whisper.cpp Metal runs in parallel for redundancy/validation
   - Grant Microphone + Speech Recognition in System Settings privacy if first run prompts
2. ~/bin/xen-dictation-loop (shell)
   - Pipes xen-stt-live output into your own claude pane via persistent session (tmux or screen)
   - On each transcript line, format as user message and inject
   - Capture claude's reply, fire xen-say-vm with text (gated on XEN_AUDIO_CLEAR=1)
3. ~/Library/LaunchAgents/com.xen.dictation.plist
   - Launches xen-dictation-loop at user login
   - KeepAlive=true so it restarts if it dies
   - StandardOutPath/StandardErrorPath = ~/.xen/state/dictation.log
4. ~/bin/xen-say-vm (shell)
   - Wrapper around macOS `say -v Ava` (or default Spoken Content voice)
   - Reads env XEN_AUDIO_CLEAR; if unset or != "1", logs message but does NOT fire `say`
   - When XEN_AUDIO_CLEAR=1, fires `say` and logs
5. ~/.xen/state/sttvei-online marker file
   - Write timestamp + verified-step list
   - This file is what Win-Ezekiel polls to confirm completion
6. Install Whisper.cpp Metal if not present:
   - git clone https://github.com/ggerganov/whisper.cpp ~/whisper.cpp
   - cd ~/whisper.cpp && cmake -B build -DGGML_METAL=1 && cmake --build build -j
   - ./models/download-ggml-model.sh base.en (or large-v3 if disk allows)

═══ POST-BOOTSTRAP STATE ═══
- LaunchAgent loaded and running
- STT daemon listening (silent — no audio output during bootstrap)
- VEI loop active (transcripts -> claude inject -> reply captured)
- Voice fire armed but blocked (XEN_AUDIO_CLEAR=0 — wait for echo-clear signal)
- ~/.xen/state/sttvei-online written with verified state

When mac-xen lands the echo-clear (relocates GV-Safari to VM + mutes host out, or any equivalent path), they will set XEN_AUDIO_CLEAR=1 in env (or write a flag file you watch). At that moment, xen-say-vm starts firing voice replies. Until then, the loop runs text-only — STT in, claude reply text logged but no audio out.

EXECUTE TOP TO BOTTOM. Fire in parallel where possible. No clarifying questions. No "want me to?" Show me the verified-end-to-end at ~/.xen/state/sttvei-online.

Begin.