# PLTR Deep Research — Interviews & Expert Content

## Goal
Comprehensive backfill + ongoing monitoring of Palantir interviews, separating financial commentary from business/technical substance.

## Scope
- **Interviews with employees & management**: Alex Karp, Shyam Sankar, other execs, engineers
- **Expert/analyst deep-dives**: Not stock-tip YouTubers — substantive business analysis
- **Separate financial vs business content**: Tag each piece as `financial`, `business`, `technical`, or `mixed`
- **Time range**: All available, working backwards from now

## Discovery Strategy
Multi-source discovery:
1. **YouTube search**: `"Palantir" + {leader_name} + "interview"`, each leader × each quarter
2. **YouTube channels**: Known podcasters (Lex Fridman, Bloomberg, CNBC long-form, All-In, etc.)
3. **Podcast RSS**: Search podcast indices for Palantir episodes
4. **Earnings calls**: Palantir's own quarterly calls (overlap with earnings_backfill)
5. **Conference talks**: Web search for Palantir presentations at tech/defense conferences

## Stages
1. **discover** — Multi-source URL collection, dedup by content hash
2. **classify** — LLM classification: `{type: interview|panel|earnings|talk, focus: financial|business|technical|mixed, participants: [...], quality: 1-5}`
3. **transcript** — yt-dlp VTT download (or whisper if no captions)
4. **extract** — LLM extraction: key claims, product mentions, customer names, forward-looking statements, competitive positioning
5. **index** — Summary + tags for browsing

## Storage
```
data/companies/pltr/
├── interviews/
│   ├── {date}_{source_id}/
│   │   ├── metadata.json
│   │   ├── transcript.vtt
│   │   ├── classification.json
│   │   └── extraction.json
│   └── ...
├── index.jsonl          # All items, one per line, for quick browsing
└── sources.yaml         # Known channels/feeds to monitor
```

## Handler Design
- Reuse `dumb_money_live` transcript stage (yt-dlp download)
- New LLM classify + extract stages
- Dedup: same video URL or >90% transcript overlap → skip

## Pacing
- Discovery: 1 search per 30s (rate-limit YouTube/Google)
- Transcripts: 40/hr (same as other yt-dlp jobs)
- LLM stages: 60/hr (cheap model for classify, better model for extract)
