# Backend WebSocket Scalability Architecture

**Author:** tele (mobile coordinator side, sourced from focused read of backend repo)
**Date:** 2026-04-19
**Backend repo:** `/Users/fathoni/Documents/Project/BlockDev/nano-street/backend` (Python/FastAPI on Cloud Run)
**Scope:** how the backend handles thousands of concurrent mobile users requesting live ticker data via WebSocket, given upstream Massive.com's connection + message-stream limits.

---

## 1. Upstream Massive connection topology

**Finding: SINGLE long-lived connection, process-wide.**

- **`websocket.py:439–449`** (`get_massive_client`) — global singleton pattern. `massive_client` is module-level (line 434), initialized once via async lock on first call.
- **`websocket.py:30–41`** (`MassiveSubscriptionManager.__init__`) — one `WebSocketClient` instance per manager (line 32: `self.ws_client`).
- **`websocket.py:42–89`** (`start` method) — single `ThreadPoolExecutor(max_workers=1)` (line 48) for blocking SDK lifecycle. Single background thread runs `self.ws_client.connect(callback)` (line 113).

**Result:** all 1000+ mobile clients are multiplexed onto ONE upstream Massive connection per backend instance.

## 2. Subscription refcounting / dedup

**Finding: YES — upstream sub opened only once per unique ticker, refcounted by connection ID.**

- **`websocket.py:292–345`** (`subscribe_client`):
  - Lines 310–318: per-ticker set of connection_ids in `self.subscriptions[ticker]` (line 33)
  - Line 312: `is_new = ticker not in self.subscriptions or not self.subscriptions[ticker]` — checks first subscriber
  - Lines 321–327: if `is_new`, only then call `_ws_subscribe(f"A.{ticker}")` upstream
- **`websocket.py:347–394`** (`unsubscribe_client`):
  - Line 367: `if not self.subscriptions[ticker]` after removal — only then call `_ws_unsubscribe`

**Example:** 1000 clients watch AAPL → ONE upstream `A.AAPL` subscription. 999 clients leave → still open. Last client leaves → upstream closes.

## 3. Per-tick fan-out concurrency model

**Finding: SEQUENTIAL (awaited) loop over subscriber set — NOT parallel. Backpressure risk.**

- **`websocket.py:146–175`** (`_handle_message`):
  - Line 156: `connection_ids = self.subscriptions[ticker].copy()` — snapshot subscriber set
  - Lines 173–174: sequential loop:
    ```python
    for connection_id in connection_ids:
        await self._publish_to_connection(connection_id, ticker, message_data)
    ```
- **`websocket.py:183–241`** (`_publish_to_connection`):
  - Line 219: `await websocket.send_json(...)` — single awaited send per connection
  - **No `asyncio.gather`** — each send blocks the event loop until complete

**Implication:** if one mobile client has a slow/buffered WS connection, it delays ticks to all OTHER subscribers of that ticker on the same backend instance. This is the binding constraint at scale.

## 4. Per-mobile-client limits

**Finding: NO enforced limits.**

- No validation in `subscribe_client` to cap tickers per connection
- No validation in `websocket_endpoint` to cap concurrent connections per JWT
- **`requests.py:6–11`** (`WebSocketMessage`) — no length constraints on tickers list
- **`ws.py:155–158`** (`websocket_endpoint`) — no rate-limit middleware or per-user quota

A single mobile client could subscribe to thousands of tickers; nothing stops them.

## 5. Massive rate-limit & disconnect resilience

**Finding: minimal. No retry, no backoff, no circuit breaker, no auto-reconnect loop.**

- **`websocket.py:243–290`** (`_ws_subscribe`, `_ws_unsubscribe`):
  - Lines 256–257: single `asyncio.wait_for(..., timeout=5.0)` — no retry on timeout
  - Line 259–264: debug log on error, then nothing
- **`websocket.py:90–125`** (`_run_in_thread`):
  - Line 113: `loop.run_until_complete(self.ws_client.connect(...))` — if connection drops, exception caught (line 114), logged, no auto-reconnect loop
  - Line 121: `is_connected = False` — terminal state, no recovery

If Massive itself rate-limits us or drops the connection, **the backend silently stays disconnected until the process restarts.**

## 6. Capacity ceiling & bottlenecks

| Dimension | Bottleneck | Est. scale | Notes |
|---|---|---|---|
| **CPU (broadcast fan-out)** | Sequential per-tick loop | 10k tickers × 100 subs ≈ 1M sends/sec | Awaited loop serializes — high-contention asyncio event loop |
| **Memory (subscriber sets)** | `Dict[str, set[str]]` | 10k tickers × 100 subs ≈ 1.2 MB | Negligible |
| **Network egress** | WS send buffer + TCP | ~1k mobile connections per instance | Each connection buffers outbound; slow client stalls event loop |
| **Massive upstream** | Single connection + message queue | Depends on Massive SDK limits | Backend can only emit one sub per ticker; no stream-shaping |
| **Cloud Run resource** | Memory/vCPU per instance | Instance default 512MB / 1 vCPU | Sequential broadcast saturates single vCPU before memory becomes issue |

**Hard stop:** sequential fan-out + slow mobile client will block all tickers for all users on that instance. Multi-instance deploy (`ARCHITECTURE.md` line 39 mentions "1-3 instances") does NOT solve cross-instance fan-out — each pod has separate `subscriptions` dict.

**Recommended threshold before refactor:** ~500 concurrent mobile clients OR ~1000 unique tickers per instance. Beyond that, sequential fan-out becomes the binding constraint.

## 7. Observability

**Finding: basic metrics present; broadcast latency NOT tracked.**

- **`main.py:474–489`** (`/metrics` endpoint):
  - Line 483: `active_ws_connections = await registry.count()` — connected client count
  - Line 488: `active_ticker_subscriptions = stats.get("total_tickers", 0)` — unique tickers subscribed upstream
  - No per-ticker subscriber-count histogram; no broadcast latency P95/P99
- **`websocket.py:396–405`** (`get_stats`) — returns `total_tickers` + `tickers: {ticker: len(conns)}` exportable but NOT auto-exported
- **`core/logging.py` / `core/metrics.py`** — structured JSON logging to Cloud Logging; no specific WebSocket/broadcast metrics instrumentation

## Summary + recommended next moves

The architecture **works for small scale** (< 100 concurrent clients, < 50 unique tickers) but exhibits critical scaling concerns:

1. **Sequential per-tick broadcast** — one slow mobile client blocks all others on that ticker on the same instance
2. **No backpressure handling**
3. **No circuit breaker** for Massive disconnects, no auto-reconnect on upstream drop
4. **No per-user limits** or rate-limiting on subscription actions
5. **Multi-instance scaling does NOT cross-pod-fan-out** — each pod has independent subscriptions

**Top three fixes (rough order of impact):**

1. **Parallel fan-out** — swap `for connection_id in connection_ids: await self._publish_to_connection(...)` (websocket.py:173-174) for `await asyncio.gather(*[... for cid in connection_ids], return_exceptions=True)`. Single biggest scalability win, ~30 minute change.
2. **Massive upstream auto-reconnect** — wrap `_run_in_thread` (websocket.py:113) in a retry loop with exponential backoff. Without this, a single Massive blip silently kills live prices for the entire backend instance until restart.
3. **Per-user limits** — cap tickers/connection (e.g., 50) and connections/JWT (e.g., 3) in `subscribe_client` and `websocket_endpoint`. Cheap insurance against runaway clients.

**Lower priority but worth mentioning later:**
- Broadcast latency metrics (P95/P99 per ticker)
- Per-ticker subscriber-count histogram
- Cross-instance pub/sub (Redis / NATS / GCP Pub/Sub) if we ever need >1 instance to share fan-out
