# PLTR Interviews — Discovery & Scoring Design

## Current State (2026-01-31)

- **Discovery**: YouTube search via yt-dlp, `{leader} × {month}` queries
- **Leaders**: Alex Karp, Shyam Sankar (2 leaders × 12 months = 24 queries)
- **Results**: ~84 videos found, ~5 per query, deduplicated
- **Processing**: transcript (yt-dlp VTT) → metadata (cue count)
- **No scoring or filtering** — all discovered items treated equally

## Problem

Current discovery is narrow (only 2 leaders, YouTube only) and undiscriminating
(stock analysis clips ranked same as 1-hour candid interviews). We want:

1. **Broader discovery** — more leaders, more query variations, more sources
2. **Quality scoring** — rank items by likely insightfulness before investing
   compute on transcription
3. **Learning** — scoring function improves over time from human feedback

## Design

### Phase 1: Broader Discovery

Expand `llm_search` params in `jobs.yaml`:

```yaml
params:
  query_template: "Palantir {leader} interview {month} {year}"
  leaders:
    - Alex Karp
    - Shyam Sankar
    - Ted Mabrey        # Head of commercial
    - Akshat Agrawal    # CRO
    - Morgan Sell        # CFO
    - Ryan Taylor        # Chief Revenue Officer
  extra_queries:
    - "Palantir {leader} keynote {year}"
    - "Palantir {leader} fireside chat {year}"
    - "Palantir {leader} conference {year}"
    - "Palantir AIPCon {year}"
    - "Palantir earnings call Q{quarter} {year}"
  months_back: 24
  max_results_per_query: 10
```

This would yield ~2000+ candidate videos (with significant deduplication).

### Phase 2: Metadata-Based Scoring

Add a `score` stage between discovery and transcript download.
Score uses **cheap signals** available from YouTube metadata (no transcript needed):

| Signal              | Weight | Rationale                                      |
|---------------------|--------|-------------------------------------------------|
| **Duration**        | High   | >20min = likely substantive; <5min = clip/promo |
| **Title keywords**  | Medium | "interview", "fireside", "keynote" = good       |
|                     |        | "stock analysis", "price target" = bad           |
| **Channel quality** | Medium | Known channels (Bloomberg, CNBC, Lex Fridman)    |
|                     |        | vs. random stock tip channels                    |
| **View count**      | Low    | Some signal but noisy                            |
| **Upload date**     | Low    | Recency bonus                                   |
| **Description text**| Medium | Mentions "in-depth", "exclusive", speakers       |

#### Scoring Function (V1 — rule-based)

```python
def score_interview(data: dict) -> float:
    """Return 0-100 insightfulness score. Higher = more worth transcribing."""
    score = 50.0  # baseline

    # Duration (strongest signal)
    dur = data.get("duration", 0)
    if dur > 3600:      score += 25   # >1hr: very likely substantive
    elif dur > 1200:    score += 15   # >20min: good
    elif dur > 300:     score += 0    # 5-20min: neutral
    else:               score -= 20   # <5min: likely clip

    # Title signals
    title = data.get("title", "").lower()
    good_words = ["interview", "fireside", "keynote", "conversation",
                  "in-depth", "exclusive", "full"]
    bad_words = ["stock", "price target", "buy", "sell", "analysis",
                 "prediction", "shorts", "reaction"]
    score += 5 * sum(1 for w in good_words if w in title)
    score -= 10 * sum(1 for w in bad_words if w in title)

    # Known quality channels (curated list, grows over time)
    # Store in jobs/data/pltr_interviews/channel_quality.yaml

    return max(0, min(100, score))
```

Items with score < 30 → `skipped` (stock analyses, reaction videos).
Items with score 30-60 → low priority.
Items with score > 60 → high priority (transcribe first).

The score becomes the item's `priority` field (lower = higher priority),
so high-quality interviews process first:
`priority = 100 - score`

### Phase 3: Learning

The scoring function is a starting point. It improves through:

#### 3a. Human Feedback Loop

After transcripts are ready, user reviews and rates:
- Dashboard shows transcript preview + current score
- User marks: "insightful" / "not insightful" / "skip"
- Feedback stored in `jobs/data/pltr_interviews/feedback.jsonl`:
  ```json
  {"video_id": "abc", "score": 75, "human_rating": "insightful", "reason": "candid about AI strategy"}
  {"video_id": "def", "score": 60, "human_rating": "not_insightful", "reason": "just stock price discussion"}
  ```

#### 3b. LLM-Assisted Scoring (after transcript exists)

Once transcript is downloaded, run a cheap LLM (haiku/flash-lite) to assess:
```
Given this transcript excerpt (first 500 words + last 500 words), rate 1-10:
- Candor: Does the speaker share genuine opinions vs. PR talking points?
- Depth: Are there specific examples, numbers, or novel insights?
- Relevance: Is this about Palantir's strategy, technology, or market position?
```

This becomes a `score_transcript` stage after `metadata`.

#### 3c. Feature Learning

With enough feedback (~50+ rated items), fit a simple model:
- Input: duration, title_keywords, channel, view_count, LLM_scores
- Output: human_rating prediction
- Model: logistic regression or small decision tree (interpretable)
- Retrain periodically, store weights in `jobs/data/pltr_interviews/scoring_model.pkl`

### Implementation Order

1. **Now**: Expand leaders + query templates in jobs.yaml (config change only)
2. **Soon**: Add `score` stage with rule-based function, set priority
3. **Later**: Add feedback UI to dashboard, store ratings
4. **Eventually**: LLM transcript scoring + learned model

### Whisper Integration

For videos without YouTube subtitles (currently ~40% fail rate):
- **mlx-whisper medium** model available locally (~12x realtime)
- Add whisper fallback in transcript stage when yt-dlp returns no VTT
- Use `task="translate"` for non-English videos (e.g., Chinese Palantir interviews)
- Items currently marked `retry_later` will auto-process once fallback is wired

### Status Labels

| Status        | Meaning                                          |
|---------------|--------------------------------------------------|
| `pending`     | Ready to process                                 |
| `in_progress` | Currently being processed                        |
| `done`        | Stage completed successfully                     |
| `failed`      | Permanent error (private video, removed, etc.)   |
| `retry_later` | Waiting on capability (whisper, IB connection)   |
| `skipped`     | Deliberately skipped (low score, irrelevant)     |
