# NAN-472 TTS duration probe — 2026-05-21

## Q1 — duration_seconds in TTS response

**Short answer: field exists but named `estimated_ms` (milliseconds), not `duration_seconds`. No dedicated per-surface endpoints — both surfaces share the same `/api/tts/sign` endpoint.**

### Stock Detail Overview TTS
- Endpoint: `POST /api/tts/sign`
- Handler: `backend/services/main/app/routers/tts.py:124`
- Response model: `TTSStreamSignResponse` at `backend/services/main/app/models/tts.py:17-42`
- duration field: **present — `estimated_ms: int`** (milliseconds, not seconds)
- NAN-414 sidecar status: **in current contract** — router reads `"duration_ms"` key from a sidecar store at `tts.py:66-72`; fallback is `estimate_audio_ms(text)` heuristic when sidecar cache misses

### News Article TTS
- Endpoint: **same** `POST /api/tts/sign` — no dedicated news-audio route exists
- Handler: same `backend/services/main/app/routers/tts.py:124`
- Response model: same `TTSStreamSignResponse` → `estimated_ms: int`
- duration field: **present — `estimated_ms: int`**
- NAN-414 sidecar status: same — shared router logic

> Note: No `/api/stock/{symbol}/summary-audio` or `/api/news/{id}/audio` routes were found. Both surfaces call `POST /api/tts/sign` client-side, passing the article/summary text as the body. TTS synthesis is not pre-generated per-article; it's on-demand.

---

## Q2 — Streaming event payload

- Protocol: **HTTP StreamingResponse (chunked `audio/mpeg`)**
- Source: `backend/services/main/app/routers/tts.py:73-77` (POST `/api/tts`) and `tts.py:198-202` (GET `/api/tts/stream`)
- Per-chunk metadata: **none** — raw MP3 bytes only, no framing
- Content-Length: **not set** — chunked transfer encoding, length unknown at stream start
- Duration metadata: **`X-Audio-Estimated-Ms` response header** — emitted once at stream start, not per-chunk
- Content-Type: `audio/mpeg`

---

## Q3 — 3-layer progress bar feasibility

| Layer | Mobile needs | BE provides today | Gap |
|---|---|---|---|
| Base track length | `duration_seconds` (or ms) | `estimated_ms` in `TTSStreamSignResponse.estimated_ms` | **None** — field exists; mobile just needs to read it (it already does) |
| Buffered position | bytes received OR seconds buffered | No `Content-Length`; no per-chunk byte marker | **Gap** — BE would need to add `Content-Length` header to stream response; without it mobile cannot compute download % |
| Played position | playback head from audio player | Client-side only (`expo-audio` status) | **None** — RN audio API already provides `positionMs` / `durationMs` |

**Verdict: needs minor BE work for full fidelity.**

- Layers 1 (base) and 3 (played) can ship today — duration and playback position are both available.
- Layer 2 (buffered) requires BE to add `Content-Length` to the streaming response. This is a one-line header addition in the `StreamingResponse` call at `tts.py:75` once total byte size is known (either pre-generated or from sidecar). Without it, the buffered segment must be approximated (animate to ~95% over `estimated_ms`, which is already what the current implementation does).

---

## Q4 — Mobile TTS UI component

### Stock Detail player: `ListenBar`

- Component: `src/features/stock-detail/presentation/components/overview/ListenBar.tsx:111`
- Props accepted:
  - `isLoaded: boolean`
  - `isLoading: boolean`
  - `isPlaying: boolean`
  - `durationMs: number` — actual duration from player (0 until audio loaded)
  - `estimatedMs: number` — pre-load estimate from BE
  - `onTogglePlay: () => void`
- Backend fields consumed: `url` (stream URL), `estimated_ms` → mapped to `estimatedMs`
- Data chain:
  - Hook: `src/features/learn/presentation/hooks/useAudioPlayerViewModel.ts:93` → calls `ttsRepository.synthesize(plainText)`
  - Repository impl: `src/features/learn/data/repositories/TtsRepositoryImpl.ts:7-9` → delegates to `TtsApi`
  - Service adapter: `src/features/learn/service/api/NanoStreetTtsAdapter.ts:16-60` → `POST /api/tts/sign` → parses `{ url, estimated_ms }`
  - Service contract: `src/features/learn/service/contracts/TtsApi.ts:1-10` → `TtsSignResult { url: string, estimatedMs: number }`
  - Business contract: `src/features/learn/business/contracts/TtsRepository.ts:1-8` → `TtsSynthResult { url: string, estimatedMs: number }`

**Note:** `ListenBar` already implements a 2-layer approximation today — animating a loading bar to 95% cap over `estimatedMs`, then switching to real `positionMs/durationMs` from the player on load (`ListenBar.tsx:52-78`).

### News article player: `ArticleListenCard`

- Component: `src/features/news/presentation/components/ArticleListenCard.tsx:18`
- Props accepted: `{ ttsText: string | null }`
- ttsText source: `markdownBody ?? bodyText ?? article.description` from `NewsArticleScreen.tsx:167`
- Backend fields consumed: **identical to Stock Detail** — same hook, same repository chain
- Data chain: `ArticleListenCard.tsx:20` → `useAudioPlayerViewModel(ttsText)` → identical path as above

---

## Recommendation

BE already exposes `estimated_ms` in `TTSStreamSignResponse` — mobile consumes it correctly as `estimatedMs`. **No BE schema change is needed for layers 1 and 3** of the progress bar. To unlock true buffering progress (layer 2), add a `Content-Length` header to the `StreamingResponse` at `backend/services/main/app/routers/tts.py:75`; this requires knowing total byte size upfront (pre-generated files can provide it; on-the-fly synthesis may need to buffer first). On the mobile side, both surfaces feed through the same `useAudioPlayerViewModel` hook and `ListenBar`/`ArticleListenCard` — wiring the new field only requires updating `ListenBar.tsx` props and the hook's return shape; `NanoStreetTtsAdapter.ts` already parses `estimated_ms`.