# Spec: Backend synthetic market-data source for local integration testing

**Author:** tele (mobile coordinator side)
**Date:** 2026-04-19
**Target repo:** `/Users/fathoni/Documents/Project/BlockDev/nano-street/backend`
**Branch (proposed):** `feat/synthetic-market-data-source`
**Target PR:** new draft on backend repo, **human-team review** (not CodeRabbit-driven)

---

## Why this exists

Mobile recently merged PR #70 (live-price WebSocket integration, wagmi-charts migration, TradingView-style price-blink). The implementation is sensitive to:

- WebSocket connect / handshake / auth behavior
- Tick cadence (300 ms batched flush in `LivePriceStore`)
- Reconnection + backoff under disconnect
- Market-status messages affecting auto-connect lifecycle
- Per-ticker subscription ref-counting
- Edge cases: stale tick, empty session, high-frequency burst

Currently we cannot exercise any of this on weekends (markets closed → Massive feed silent). Recording / replay isn't enough either — we need the ability to *trigger* specific scenarios on demand (force market-open, drop the connection, push a specific tick).

Mobile-side mocking was rejected because it requires duplicating the backend's WS protocol and message-shaping logic, creating a drift / false-alarm risk that would compound over time.

The backend already has a clean abstraction around the real upstream (`MassiveSubscriptionManager`). Adding a synthetic implementation of the same interface gives us a hermetic test environment with zero drift risk: the real backend code path runs end-to-end, only the lowest layer (data ingest) is mocked.

## Scope

### In scope
- Extract a Protocol / ABC from `MassiveSubscriptionManager` defining the public interface (start, stop, subscribe_client, unsubscribe_client)
- Implement `SyntheticSubscriptionManager` with the same interface
- Modify the singleton factory `get_massive_client()` to pick implementation via an env flag
- Add `USE_SYNTHETIC_DATA` (and any companion config) to `core/config.py`
- Synthetic features (Phase 1, ship first):
  - Configurable ticker universe via env var (e.g., `SYNTHETIC_TICKERS=AAPL,NVDA,MSFT,ADBE`)
  - Per-ticker random-walk price generation (configurable seed for determinism)
  - Configurable tick cadence (default ~1 Hz per ticker, env-tunable)
  - Always-on regardless of real market hours (the entire point — overrides `marketStatus`)
- Integration test demonstrating the synthetic path emits well-formed `stock_aggregate` messages compatible with the existing wire schema
- README section in backend repo: how to run the synthetic mode locally + which env vars to set

### Out of scope (Phase 2, separate PR)
- REST control plane to drive scenarios live (`POST /push`, `POST /disconnect`, etc.)
- Realistic OHLCV candle generator (current spec uses random walk on close, OHLV derived simply)
- Stale-ticker simulation (e.g., explicitly stop emitting for a ticker)
- High-frequency burst mode

The Phase 2 features are valuable for full scenario testing but not strictly needed for the weekend-test core use case. Ship Phase 1, gather feedback, then layer Phase 2 in a follow-up.

### Explicitly NOT changing
- The `MassiveSubscriptionManager` real implementation — only being abstracted behind an interface, no behavior change
- The WS router (`app/routers/ws.py`) — it consumes the factory's output, doesn't care about which impl
- Production code path — when `USE_SYNTHETIC_DATA` is unset / false, behavior is identical to today
- Auth / JWT validation — synthetic mode still requires valid JWT (we want to exercise the real auth path)

## Design

### Interface (new file `services/main/app/repositories/market_data_source/protocol.py`)

```python
from typing import Protocol, runtime_checkable
from collections.abc import Iterable

@runtime_checkable
class MarketDataSource(Protocol):
    """Abstraction over the live market-data subscription manager.
    Both Massive (real) and Synthetic (dev/test) impls satisfy this.
    """

    async def start(self) -> None: ...
    async def stop(self) -> None: ...
    async def subscribe_client(self, connection_id: str, tickers: Iterable[str]) -> None: ...
    async def unsubscribe_client(self, connection_id: str, tickers: Iterable[str]) -> None: ...
```

### Real impl (rename/move existing)

`MassiveSubscriptionManager` keeps its current behavior, gains nothing more than an explicit `MarketDataSource` annotation on its public methods (proves it satisfies the protocol via runtime_checkable). No file moves required if the import paths are stable; otherwise relocate to `app/repositories/market_data_source/massive.py`.

### Synthetic impl (new file `services/main/app/repositories/market_data_source/synthetic.py`)

```python
class SyntheticSubscriptionManager:
    """Generates synthetic ticks for a configured universe of tickers.
    Same interface as MassiveSubscriptionManager — drop-in replacement
    behind get_massive_client() when USE_SYNTHETIC_DATA=true.
    """

    def __init__(self, tickers: list[str], seed: int = 42, cadence_hz: float = 1.0):
        ...

    async def start(self) -> None:
        # spawn background task that emits ticks at configured cadence
        ...

    async def stop(self) -> None:
        ...

    async def subscribe_client(self, connection_id, tickers) -> None:
        # register the connection; emitter loop will push to it
        ...

    async def unsubscribe_client(self, connection_id, tickers) -> None:
        ...
```

Internals:
- One asyncio task per ticker, sleeping `1/cadence_hz` seconds between ticks
- Per-ticker state holds last close + walks via `close += random.gauss(0, sigma)` (sigma proportional to price, e.g. 0.001 * price for a typical stock)
- OHLV derived from the close: high = close * (1 + small_pct), low = close * (1 - small_pct), open = previous_close, volume = constant or random in a band
- Every tick is broadcast to all subscribed connections via the same `_publish_to_connection()` mechanism the real impl uses (extract this helper to a shared module if needed)
- Wire format identical: `{"type": "stock_aggregate", "ticker": "...", "data": {open, high, low, close, volume, vwap, transactions, timestamp}}`

### Factory (modify `services/main/app/repositories/massive_client/__init__.py` or wherever `get_massive_client` lives)

```python
def get_massive_client() -> MarketDataSource:
    if settings.use_synthetic_data:
        return _get_synthetic_singleton()
    return _get_massive_singleton()
```

Both singletons keep their current pattern. Tests can monkeypatch `get_massive_client` as before.

### Config (modify `services/main/app/core/config.py`)

Add to `Settings`:
```python
use_synthetic_data: bool = False
synthetic_tickers: list[str] = ["AAPL", "NVDA", "MSFT", "GOOGL", "AMZN"]
synthetic_cadence_hz: float = 1.0
synthetic_seed: int = 42
```

All env-driven via the existing pydantic-settings pattern (`USE_SYNTHETIC_DATA=true` etc.).

### Tests

`services/main/tests/test_synthetic_subscription_manager.py`:
- `test_emits_well_formed_aggregate_messages` — start manager, subscribe a fake connection, assert N ticks arrive within timeout, each matches `WebSocketResponse[stock_aggregate]` schema
- `test_subscribe_unsubscribe_isolates_clients` — two fake connections, unsubscribe one, assert only the other still receives ticks
- `test_factory_returns_synthetic_when_flag_set` — set `USE_SYNTHETIC_DATA=true`, call `get_massive_client()`, assert isinstance returns synthetic
- `test_factory_returns_massive_by_default` — flag unset, assert returns real impl

## Acceptance criteria

- All existing backend tests still pass (`uv run pytest services/main`)
- New tests added per above all pass
- Production behavior unchanged when `USE_SYNTHETIC_DATA` is unset/false
- Manual verification (the agent should perform this and include output in the PR description):
  1. Start backend with `USE_SYNTHETIC_DATA=true` + a known ticker universe
  2. Use `wscat` or similar to connect to the WS endpoint with a valid JWT
  3. Subscribe to AAPL, observe ticks streaming at configured cadence with well-formed payloads
  4. Subscribe to a ticker NOT in `SYNTHETIC_TICKERS` — assert behavior (either no ticks OR a graceful "ticker not in synthetic universe" log)
- README section added to `services/main/README.md` (or repo root if conventions differ): "Running with synthetic market data for local integration testing"

## PR requirements (CRITICAL — human review, NOT CodeRabbit)

The actual backend team will review this PR. The description must be **human-readable**, not a SHA dump. Required sections in the PR body:

1. **Why this PR exists** (1 paragraph) — the weekend-testing motivation, ties back to mobile's PR #70 that just merged
2. **What changes** (bulleted file-list with one-line "what + why" per file)
3. **What does NOT change** (explicit list — production path, real Massive client, auth, message wire format)
4. **How to use it locally** (3-5 line code block with env vars + run command)
5. **Sample output** (paste of actual `wscat` session showing well-formed ticks coming through synthetic mode)
6. **Test plan** (checklist: existing tests pass, new tests pass, manual verification done with output)
7. **Open questions for backend team** (if any design choice is ambiguous, ask explicitly — better than assuming)

PR is **DRAFT**. Do NOT mark ready-for-review. Backend team flips it ready when they're satisfied.

No CodeRabbit retrigger commands. No squash-self-merge. No auto-approve.

## Branching + push

- **Branch name:** `feat/synthetic-market-data-source` (or backend repo's convention if different — agent picks)
- **Worktree:** `<backend_repo>/.claude/worktrees/synthetic-market-data` (or backend convention)
- **Push authorized:** YES, force-with-lease if rebasing
- **Mark ready:** NO — draft only, backend team's call

## Out-of-scope guardrails for the agent

- Do NOT touch any router file beyond what's strictly required for the factory abstraction
- Do NOT add new third-party dependencies (synthetic generator should use only stdlib `random` + `asyncio`)
- Do NOT modify any config defaults that affect production
- Do NOT delete or rename `MassiveSubscriptionManager` — it stays, just gains a Protocol annotation
- If you discover the abstraction is messier than this spec assumes, STOP and ping coordinator with the actual structure rather than improvising

## Coordination

- Backend agent works in backend repo, separate worktree
- Mobile-side: this spec lives in mobile's `.agent-status/` for our trace, but the actual implementation lives in backend repo only
- After the draft PR opens, mobile can independently `pnpm start` against the local backend (`EXPO_PUBLIC_API_BASE_URL=http://<mac-ip>:8000`) for end-to-end integration testing — that's a separate mobile-side activity once backend draft PR is open

---

**Next step:** coor dispatches a backend agent in a fresh worktree on the backend repo with this spec as the brief.
