# Transcript Viewer Design

## Current State

**Earnings viewer** (`finance/earnings/backtest/app.py`):
- Dash framework (not Gradio)
- Price chart + transcript + YouTube player
- Click-to-seek sync between transcript and video
- Speaker diarization badges

**Jobs dashboard** (`jobs/dashboard.py`):
- Gradio framework
- Shows job items, stages, status

## Goal

1. Link from jobs dashboard row → transcript viewer for YouTube videos
2. Share code between earnings viewer and YouTube viewer
3. Configurable: with/without price charts

## Design: Standalone Viewer App + Shared Components

### Architecture

```
lib/transcript_viewer/
├── __init__.py
├── loader.py           # Load VTT/JSONL transcripts, normalize format
├── youtube.py          # YouTube player embed HTML/JS
└── dash_renderer.py    # Transcript row rendering for Dash

jobs/
├── transcript_viewer.py   # Standalone Dash app for YT videos
└── dashboard.py           # Links to transcript viewer

finance/earnings/backtest/
└── app.py                 # Uses shared components + price charts
```

### Shared Components

**1. loader.py** — Transcript loading
```python
def load_vtt(path: Path) -> list[dict]:
    """Parse VTT → [{offset_s, text, speaker?}]"""

def load_jsonl(path: Path) -> list[dict]:
    """Load JSONL transcript (earnings format)"""

def normalize_transcript(lines: list[dict]) -> list[dict]:
    """Ensure consistent keys: offset_s, text, speaker, time_str"""
```

**2. youtube.py** — YouTube player
```python
def youtube_player_js(video_id: str) -> str:
    """Return JS for YouTube iframe API + seekTo function"""

def youtube_player_html(video_id: str, height: int = 200) -> str:
    """Return player div + initialization"""
```

**3. dash_renderer.py** — Transcript rendering
```python
def render_transcript_rows(
    lines: list[dict],
    speaker_colors: dict = DEFAULT_SPEAKER_COLORS,
    show_price: bool = False,
    metric_attrs: dict = None,  # For earnings: data-rm-10 etc
) -> list[html.Div]:
    """Render clickable transcript rows with timestamps, speaker badges"""

def click_to_seek_js() -> str:
    """JS event listener for .seek-time clicks"""
```

### YouTube Transcript Viewer (`jobs/transcript_viewer.py`)

Standalone Dash app on port 7930:

```
GET /view/{job_id}/{item_key}
  → Load VTT from job storage dir
  → Extract video_id from item data
  → Render: YouTube player + transcript panel
```

Features:
- Video embed with seekTo
- Transcript with timestamps, click-to-seek
- Speaker diarization (if .json available)
- Metadata header (title, duration, upload_date)

### Dashboard Integration

Add "View" link in jobs dashboard item detail:
```python
# In dashboard.py item detail section
if job.tags and "yt-channel" in job.tags:
    transcript_url = f"http://localhost:7930/view/{job_id}/{item_key}"
    # Add link to detail panel
```

### Earnings App Refactor

Update `finance/earnings/backtest/app.py` to import from shared lib:
```python
from lib.transcript_viewer import load_jsonl, youtube_player_js, render_transcript_rows

# Keep earnings-specific: price charts, metrics, crosshair
# Use shared: transcript rows, YouTube player, click handling
```

## Implementation Order

1. **Create lib/transcript_viewer/** with loader.py, youtube.py
2. **Create jobs/transcript_viewer.py** — minimal viewer for YT videos
3. **Add dashboard link** — View button in item detail
4. **Refactor earnings app** — extract to shared components

## Open Questions

- Should viewer run as separate process or same as dashboard?
- Cache transcripts in memory for fast switching?
- Support local video files (not just YouTube)?
