# Session Intelligence Server (helm.api)

**Date**: 2026-02-19
**Status**: Design approved
**Session**: Brainstorming session — unified hist/badge/session intelligence

## Problem

1. **`/hist` regenerates from scratch** every invocation — Claude scans the full conversation inline, no caching, no persistence
2. **Badge spawns 3 cold Python processes per prompt** (~200-500ms startup each for badge_worker, route_worker, principles_worker) — the LLM calls themselves are fast (haiku via hot server), but the interpreter cold-start dominates
3. **Badge and hist are independent systems** tracking the same thing — badge maintains a flat topic list, hist produces a rich tree, neither knows about the other
4. **No observable way to tune** the prompt or test variants in production

## Architecture

### New: helm.api.localhost (FastAPI, port 8130)

One persistent FastAPI server handles all session intelligence. Replaces the 3 ephemeral worker subprocesses.

```
Claude Code Session
  │
  │ UserPromptSubmit hook (curl one-liner, <5ms)
  ▼
┌────────────────────────────────────────────────┐
│  Helm API Server (FastAPI, port 8130)          │
│  helm.api.localhost                            │
│                                                │
│  Per-session state (in-memory + disk):         │
│  ┌─────────────────────────────────────────┐   │
│  │ session_id → {                          │   │
│  │   tree: [{topic, status,                │   │
│  │           children, started}],          │   │
│  │   badge: ["line1", "line2"],            │   │
│  │   theme: "...",                         │   │
│  │   hist_ascii: "pre-rendered",           │   │
│  │   last_update: timestamp                │   │
│  │ }                                       │   │
│  └─────────────────────────────────────────┘   │
│                                                │
│  On /hook/prompt:                              │
│  1. Read session JSONL tail (recent actions)   │
│  2. Single haiku LLM call via hot server       │
│     → updated tree + badge + theme             │
│  3. Pre-render hist ASCII with colors          │
│  4. Push badge + title to iTerm2               │
│  5. Persist state to disk + watch DB           │
│                                                │
│  Endpoints:                                    │
│  POST /hook/prompt     ← curl from hooks       │
│  GET  /hist/{sid}      ← /hist skill reads     │
│  GET  /state/{sid}     ← watch dashboard       │
│  GET  /health                                  │
└───────────────┬────────────────────────────────┘
                │ LLM calls
                ▼
        Hot Server (:8120)
        haiku via subscription
```

### Hook: curl one-liner

Replace the current `python -m supervisor.sidekick.hooks.handler` with a curl POST:

```json
{
  "type": "command",
  "command": "curl -sf https://helm.api.localhost/hook/prompt -H 'Content-Type: application/json' -d \"$(printf '{\"sid\":\"%s\",\"iterm\":\"%s\",\"prompt\":\"%s\"}' \"$CLAUDE_SESSION_ID\" \"$ITERM_SESSION_ID\" \"$(echo $CLAUDE_USER_PROMPT | head -c 500)\")\" &>/dev/null &",
  "async": true
}
```

Sub-5ms hook latency. Fire and forget. Falls back silently if server is down.

### Single LLM call per prompt

One haiku call replaces the current 3 separate subprocess + LLM calls:

```
Input to LLM:
  current_tree: [structured JSON — hierarchical topics with children]
  latest_prompt: "now fix the badge system"
  recent_actions: ["edited auth.py", "ran tests"] (from JSONL tail, server-side)
  project: "rivus"

Output from LLM:
  {
    "tree": [
      {"topic": "proxy fix", "status": "done", "children": [
        {"action": "fix token expiry", "type": "edit"},
        {"action": "tests passing", "type": "verify"}
      ]},
      {"topic": "badge redesign", "status": "active", "children": [
        {"action": "research current arch", "type": "explore"}
      ]}
    ],
    "badge": ["badge redesign", "proxy fix"],
    "theme": "sidekick improvements",
    "title": "🌊 sidekick improvements"
  }
```

### Hierarchical tree as canonical data model

The tree is the source of truth. All display surfaces derive from it:

- **Badge** = top 4 active topics from tree → pushed to iTerm2
- **Title** = emoji + theme → pushed to iTerm2 pane title
- **Statusline** = brief status → written to `~/.coord/status/{sid}.txt` for statusline.sh
- **/hist** = full colored ASCII tree → pre-rendered by server, Claude just prints it
- **Watch dashboard** = structured tree + events → read from shared state/DB

### /hist skill (simplified)

The skill becomes trivial — fetch pre-rendered tree from server, print it:

```
GET https://helm.api.localhost/hist/{session_id}
→ returns colored ASCII text with iTerm2 delimiters, ready to print
```

Zero Claude tokens spent on tree construction. Server owns the rendering logic (same coloring rules from current skill.md, implemented server-side).

### Context: send everything, ablate in gym

The server reads the session JSONL tail server-side (it knows the JSONL path from session registration). Initial implementation sends rich context (prompt + recent tool_use events). The gym tests ablation variants to find the sweet spot:

- Prompt only
- Prompt + last 10 tool_use events
- Prompt + last 20 JSONL lines (all types)

### Debouncing & consolidation

Current 3 workers merge into one server-side handler:

- **Badge + hist + theme**: single LLM call
- **Route**: deterministic from tree metadata (project field) — no LLM needed
- **Principles**: separate concern, keep as own worker or reduce frequency (every 5th prompt)
- **Debounce**: 60s per session (same as current), server-side

## Gym: Extend Badge Gym

Extends `learning/gyms/badge/` for unified tree + badge tuning.

### What's tested

- **System prompt variants** — the tree + badge prompt (the main thing we tune)
- **Context levels** — prompt only vs. prompt + JSONL tail (ablation study)
- **Rendering variants** — how tree → colored ASCII

### Score dimensions

| Dimension | Weight | What it measures |
|-----------|--------|-----------------|
| tree_accuracy | 35% | Does the tree reflect what actually happened? |
| badge_quality | 25% | Are badge lines useful and concrete? |
| hierarchy_depth | 15% | Appropriate nesting — not flat, not over-nested |
| theme_stability | 15% | Theme changes only when session purpose genuinely shifts |
| transition_quality | 10% | Topic adds/removes happen at the right moments |

### Same gen→eval→corpus loop

1. Load real sessions from JSONL files
2. Replay each through prompt variants (with different context levels)
3. LLM judge scores each (session, variant, context_level)
4. HTML report: best variant + per-session detail + ablation results
5. Append-only corpus (JSONL) for trend tracking

## State persistence

```
~/.coord/helm_state/{session_id}.json   ← full tree + badge + theme
~/.coord/helm_state/{session_id}.ascii  ← pre-rendered hist (fast read for /hist)
watch/data/watch.db                     ← events + session metadata (shared with dashboard)
```

## Infrastructure

### Caddyfile addition

```
helm.api.localhost {
    reverse_proxy localhost:8130
    tls internal
}
```

### Registry addition

```python
"helm-server": {
    "emoji": "👁️", "desc": "Session intelligence API",
    "cmd": "inv helm.server",
},
```

## Migration path

1. Build watch API server with `/hook/prompt` endpoint
2. Wire up curl hook alongside existing handler (both fire during transition)
3. Validate tree quality matches or beats current badge output
4. Switch /hist skill to read from server
5. Remove old badge_worker/route_worker subprocess spawning
6. Extend gym for tree + ablation testing
7. Iterate on prompt via gym results

## Key decisions

| Decision | Choice | Rationale |
|----------|--------|-----------|
| Server location | New `helm.api.localhost` (:8130) | Separates API from Gradio dashboard, follows conventions |
| Hook transport | curl one-liner | Sub-5ms, zero Python startup |
| Data model | Hierarchical tree | Rich enough for hist, badge derived from it |
| LLM calls | Single call per prompt | One haiku via hot server replaces 3 subprocess launches |
| /hist rendering | Server pre-renders ASCII | Zero Claude tokens, consistent with gym |
| Context depth | Send everything, ablate in gym | Empirical approach to finding what matters |
| Gym approach | Automated, extending badge gym | Same proven gen→eval→corpus loop |
