πŸ“‹ Todo Report

Generated: 2026-02-22 08:24:32

πŸ“š Commands Cheatsheet β–Ό

Regenerate report:

python tools/todos/generate_report.py

Enrich todos with LLM metadata (priority/difficulty/utility):

python tools/todos/enrich.py          # Run enrichment
python tools/todos/enrich.py --dry-run # Preview only

Impact analysis (multi-model consensus on highest-impact items):

python tools/todos/analyze.py              # maxthink preset (Opus, GPT-Pro, Grok, Gemini)
python tools/todos/analyze.py -c fast      # Fast/cheap models
python tools/todos/analyze.py --top 5      # Top 5 instead of 3

After enrichment: Report will show colored priority badges (P1-P5), difficulty levels, and utility descriptions.

Open file in editor: Click file location to select, copy (Cmd+C), then Cmd+P in VS Code and paste.

637
Total Items
582
Open
55
Completed
3
Projects

πŸ’° investor

Investor Replication (2 open)

☐ Replicate top investor frameworks
replication/ β€” Acquire content, extract structured thesis elements, build company timelines, operationalize into scoring. Three targets: Reeves/Infuse (Substack + letters), Druckenmiller (interviews + 13F), Tepper (interviews + 13F). Start with Reeves β€” most written content, existing principles in research/infuse_principles.md.
☐ Bond covenant analysis
covenants/ β€” Extract structured covenants from EDGAR indentures, compute headroom, track amendments. EBITDA definition resolution is the hard part.

Phase 0: Brain Demo (Now) (14 open)

β–Ά

Phase 1: Foundation (3 open)

☐ Price ingestion pipeline
  • Historical data backfill (daily OHLCV)
  • On-demand fetch for analysis
  • Store in SQLite or Redis timeseries
☐ Source Hub integration
  • Connect to existing source MCP server
  • Define source types: filings, transcripts, news, social
  • Basic ingestion β†’ extraction β†’ storage flow
☐ Monitoring scaffold
  • Define watchlist table (names, theses, broad themes)
  • Cron or daemon skeleton for periodic checks
  • Simple "new content detected" alerts

Phase 2: Assimilation Engine (3 open)

☐ Relevance filtering
  • Given new content, classify by watchlist item
  • LLM-based relevance scoring
  • Route to appropriate name/thesis
☐ Extraction pipeline
  • Facts, claims, variable updates from content
  • Structured JSON output (per vision doc)
  • Append to evidence ledger
☐ Executive summary generation
  • Per-name and per-thesis summaries
  • "What's changed since last update?"
  • Highlight thesis-altering signals

Phase 3: Sentiment (4 open)

☐ Research existing tools
  • What APIs exist? (Twitter, Reddit, StockTwits, YouTube)
  • What sentiment libraries work well?
  • Academic papers on sentiment-price relationships
☐ Build sentiment tracker
  • Ingest social mentions per symbol
  • Compute sentiment score (simple first: positive/negative/neutral)
  • Track volume and sentiment over time
☐ Divergence detection
  • Compare sentiment trend vs price trend
  • Flag: "price up, sentiment flat/down" and vice versa
  • Backtest: do divergences predict continuation?
☐ Dashboard widget
  • Social din chart per symbol
  • Highlight divergence periods
  • Quick sentiment snapshot

Phase 4: Research Mode (3 open)

☐ Historical analysis toolkit
  • Given an event, find earliest mentions
  • Timeline reconstruction
  • "Who called it?" search
☐ Present analysis framework
  • Structured prompts for thinking through news
  • Second-order effects template
  • Confirm/refute checklist
☐ Case study format
  • Narrative + structured data output
  • Lessons learned extraction
  • Feed back into monitoring rules

Phase 5: Causal Learning (3 open)

☐ Forecast grading system
  • Track predictions with timestamps
  • Auto-grade when horizon passes
  • Aggregate accuracy metrics
☐ Causal graph experiments
  • Prototype: how to represent causal chains?
  • Options: neo4j, embeddings, rules engine
  • Start with manual curation, then automate
☐ Feedback loops
  • Graded forecasts β†’ update priors
  • Successful patterns β†’ monitoring rules
  • Failed predictions β†’ post-mortems

Ideas / Backlog (6 open)

β–Ά

🌊 rivus

People β€” This Week (2 open)

☐ Connect with SMAI
Figure out dates, schedule for this coming week (week of Feb 10)
☐ Connect with SMAI
Figure out dates, schedule for this coming week (week of Feb 10)

Priority (40 open)

β–Ά

Review with User (6 open)

β–Ά

Investor Replication & Covenant Analysis (4 open)

☐ Investor replication system
~/all-code/investor/replication/ β€” Extract analytical frameworks from top investors (Reeves/Infuse, Druckenmiller, Tepper) by acquiring their content (Substack, interviews, letters, 13F), extracting structured thesis elements per document, building company timelines, and operationalizing into scoring/screening. Design task: tasks/design/investment_philosophy_extraction.md
☐ Bond covenant analysis
~/all-code/investor/covenants/ β€” Extract structured covenants from EDGAR indentures/credit agreements, compute headroom vs current financials, track amendments over time. Key challenge: resolving nested EBITDA definitions and cross-references.
☐ Investor replication system
~/all-code/investor/replication/ β€” Extract analytical frameworks from top investors (Reeves/Infuse, Druckenmiller, Tepper) by acquiring their content (Substack, interviews, letters, 13F), extracting structured thesis elements per document, building company timelines, and operationalizing into scoring/screening. Design task: tasks/design/investment_philosophy_extraction.md
☐ Bond covenant analysis
~/all-code/investor/covenants/ β€” Extract structured covenants from EDGAR indentures/credit agreements, compute headroom vs current financials, track amendments over time. Key challenge: resolving nested EBITDA definitions and cross-references.

Learning (8 open)

β–Ά

Newsflow: CEO Interviews & Podcasts (10 open)

β–Ά

LLM Tools (2 open)

☐ fetch tool for lib/llm tool registry
lib/llm/tools.py β€” LLM can fetch URLs from search results. Needs: brain's fetch_escalate with smart proxy escalation, BrightData unlocker/JS rendering for paywalled/dynamic content. High-volume JS fetching may need existing browser service or BrightData Browser CDP endpoint. Design considerations: mode param (auto/js/unlocker), rate limiting, content truncation for token efficiency.
☐ fetch tool for lib/llm tool registry
lib/llm/tools.py β€” LLM can fetch URLs from search results. Needs: brain's fetch_escalate with smart proxy escalation, BrightData unlocker/JS rendering for paywalled/dynamic content. High-volume JS fetching may need existing browser service or BrightData Browser CDP endpoint. Design considerations: mode param (auto/js/unlocker), rate limiting, content truncation for token efficiency.

Transcription (6 open)

β–Ά

KB & Self-Learning (6 open)

β–Ά

Visual TODO (2 open)

☐ Explore Gradio themes - https://www.gradio.app/guides/theming-guide
  • Built-in: gr.themes.Glass(), gr.themes.Ocean(), gr.themes.Citrus()
  • Pick one consistent theme for all rivus Gradio apps
☐ Explore Gradio themes - https://www.gradio.app/guides/theming-guide
  • Built-in: gr.themes.Glass(), gr.themes.Ocean(), gr.themes.Citrus()
  • Pick one consistent theme for all rivus Gradio apps

System (4 open)

☐ Background hook - Hook to check if hooks need updating (meta-hook for hook maintenance)
☐ πŸ” admin.jott.ninja + admin-only Cloudflare Access policy
Set up a separate Cloudflare Access policy ("admin-only") restricted to tchklovski@gmail.com only (Google OAuth). Apply to admin.jott.ninja and any future admin-only subdomains. Ensure the existing "allow-friends" policy does NOT grant access to admin-only paths. Move sensitive content (billing, API keys, cost tracking) from watch dashboard to dedicated admin page. Eventually: live cost polling from Anthropic/OpenAI/BD APIs.
☐ πŸŽ™οΈ Voice UI to Claude Code session
Cloudflare-protected endpoint (voice.jott.ninja or under admin.jott.ninja) that lets you talk to a running Claude Code session on the laptop via voice. Admin-only access (tchklovski@gmail.com).
  • Existing prototype: finance/earnings/live/transcribe_gpt_realtime.py β€” OpenAI Realtime API via WebSocket (GPT-4o-transcribe, VAD, 24kHz PCM, sub-second latency). Reuse the WebSocket/audio capture patterns.
  • Architecture: Browser (mic) β†’ WebRTC/WebSocket β†’ Cloudflare tunnel β†’ laptop endpoint β†’ STT (OpenAI Realtime or Deepgram) β†’ text β†’ Claude Code session (via it2api send-text or supervisor) β†’ response β†’ TTS β†’ audio back to browser
  • Key pieces: (1) Web UI with mic capture + audio playback, (2) backend that bridges audio stream to STT, (3) Claude Code session targeting (pick which session to talk to), (4) TTS for responses (OpenAI TTS or browser SpeechSynthesis), (5) Cloudflare Access admin-only policy
  • Why admin-only: This is a remote shell into your machine β€” must be locked to your email only, never friends
☐ ☁️ Always-on serving from GCP box under jott.ninja
Run cloudflared on mcp-server-box (GCP) so it can serve *.jott.ninja subdomains alongside the laptop tunnel. Same Cloudflare Access (Google OAuth) protects everything. Two use cases:
  • Steps: (1) Install cloudflared on mcp-server-box, (2) authenticate to existing rivus tunnel, (3) add ingress rules for GCP-served hostnames in tunnel config, (4) deploy Gradio app container, (5) add DNS route cloudflared tunnel route dns rivus .jott.ninja.
  • Key insight: One tunnel can have connectors on multiple machines β€” Cloudflare routes by hostname to the right origin. Laptop handles local dev services, GCP handles always-on apps.
  • Existing infra: GCP box already has Docker, nginx, 4 vCPU / 16GB RAM. See ai/docs/mcp-box-setup.md for box details, infra/cloudflared.yml for current tunnel config.

Writing / Substack (4 open)

☐ Parallel & Speculative Development - Write up patterns for developing with cheap parallel workers
  • Speculative execution, fork-and-verify, test assumptions in background, design for parallel dev
  • Real examples from rivus: vario pipeline, background agents, fork-to-check-history
  • Key insight: copies of workers are cheap, waiting is expensive
  • This is a genuine contribution β€” most dev practices assume serial work
☐ Decide where writeups live - writing/ or design/writing/ in rivus?
  • Substack drafts, learnings, patterns worth sharing
  • Separate from design/drafts (which are LLM review outputs)
  • Should be git-tracked, easy to preview as markdown
☐ Parallel & Speculative Development - Write up patterns for developing with cheap parallel workers
  • Speculative execution, fork-and-verify, test assumptions in background, design for parallel dev
  • Real examples from rivus: vario pipeline, background agents, fork-to-check-history
  • Key insight: copies of workers are cheap, waiting is expensive
  • This is a genuine contribution β€” most dev practices assume serial work
☐ Decide where writeups live - writing/ or design/writing/ in rivus?
  • Substack drafts, learnings, patterns worth sharing
  • Separate from design/drafts (which are LLM review outputs)
  • Should be git-tracked, easy to preview as markdown

Trading / Investor (2 open)

☐ Portfolio news monitoring - Monitor news about portfolio companies, assess market reaction and implications
  • Track news events (earnings, product launches, regulatory, macro) for held positions
  • Assess: how is the market reacting? how should we be reacting?
  • Compare market reaction vs our fundamental view β€” find mismatches (overreaction, underreaction)
  • Feed into position sizing / exit decisions in moneygun
☐ Portfolio news monitoring - Monitor news about portfolio companies, assess market reaction and implications
  • Track news events (earnings, product launches, regulatory, macro) for held positions
  • Assess: how is the market reacting? how should we be reacting?
  • Compare market reaction vs our fundamental view β€” find mismatches (overreaction, underreaction)
  • Feed into position sizing / exit decisions in moneygun

Self-Learning & Iteration (vario/geneval direction) (10 open)

β–Ά

Refactoring (2 open)

☐ Move smart-fetch logic to browser project - brain/fetcher.py + refusal.py (~400 lines) should move to browser
  • browser exposes /smart-fetch endpoint with JS retry, refusal detection
  • brain just calls browser, handles caching + LLM analysis
☐ Move smart-fetch logic to browser project - brain/fetcher.py + refusal.py (~400 lines) should move to browser
  • browser exposes /smart-fetch endpoint with JS retry, refusal detection
  • brain just calls browser, handles caching + LLM analysis

Long-term (4 open)

☐ πŸ”΄ Rapid takeoff company sketch πŸ”΄
What would a rapid-takeoff AI-native company look like? Sketch out:
  • Mission & focus: What problem, what wedge, what makes it defensible
  • Funding: How much, what stages, what milestones unlock each round
  • Team & roles: Who to hire first (and last), what each role's mission/focus looks like individually β€” not just titles but what each person should be obsessing over in months 1-6 vs 6-18
  • Velocity model: What enables rapid iteration β€” small team, AI leverage, tight feedback loops, what's automated vs human-judgment
  • Anti-patterns: What slows down takeoff (premature scaling, wrong hires, too much process, consensus culture)
  • Calibration: Study real rapid-takeoff examples (Midjourney: 11 people β†’ $200M ARR, Cursor, Perplexity early days, Instagram pre-acquisition) β€” what did the org chart actually look like?
☐ πŸ”΄ Rapid takeoff company sketch πŸ”΄
What would a rapid-takeoff AI-native company look like? Sketch out:
  • Mission & focus: What problem, what wedge, what makes it defensible
  • Funding: How much, what stages, what milestones unlock each round
  • Team & roles: Who to hire first (and last), what each role's mission/focus looks like individually β€” not just titles but what each person should be obsessing over in months 1-6 vs 6-18
  • Velocity model: What enables rapid iteration β€” small team, AI leverage, tight feedback loops, what's automated vs human-judgment
  • Anti-patterns: What slows down takeoff (premature scaling, wrong hires, too much process, consensus culture)
  • Calibration: Study real rapid-takeoff examples (Midjourney: 11 people β†’ $200M ARR, Cursor, Perplexity early days, Instagram pre-acquisition) β€” what did the org chart actually look like?
Reproduce PhD research/results
☐ Repro my PhD
Reproduce PhD research/results

Phase 1: Search fallback (4 open)

☐ If input isn't URL/event/question β†’ browser search β†’ fetch top result
☐ Add brain search "query" CLI command
☐ If input isn't URL/event/question β†’ browser search β†’ fetch top result
☐ Add brain search "query" CLI command

Phase 2: Multi-result analysis (vario integration) (8 open)

β–Ά

Unified NL input (CLI + UI) (10 open)

β–Ά

Active Development (6 open)

β–Ά

Research Queries (2 open)

☐ Develop research-oriented precursors (research_* in query_precursors.yaml)
  • These may be better as reusable analysis patterns than one-off prompts
  • Consider: composable prompt fragments vs monolithic prompts
☐ Develop research-oriented precursors (research_* in query_precursors.yaml)
  • These may be better as reusable analysis patterns than one-off prompts
  • Consider: composable prompt fragments vs monolithic prompts

Infrastructure (4 open)

☐ Add CLI command to list precursors by status
☐ Add test harness: run prompt against sample docs, compare outputs
☐ Add CLI command to list precursors by status
☐ Add test harness: run prompt against sample docs, compare outputs

Automation / Integration (8 open)

β–Ά

Top Level (4 open)

☐ Streaming coalesce / incremental synthesis - Coalesce information as it arrives (fetches, LLM streams, chunks):
  • Real-time doc updates as data comes in (e.g., person search β†’ update profile as each source fetched)
  • Line numbers + content hashes for addressing ranges, detecting overlap
  • N LLMs propose content β†’ shuffle lines into place β†’ edit/unify in real-time
  • Use case: parallel research streams merge into single evolving document
  • Think: collaborative doc where each source/model contributes lines, system detects redundancy and merges
☐ Live audio analysis for Tesla call - Real-time audio stream analysis for today's Tesla earnings call
☐ Streaming coalesce / incremental synthesis - Coalesce information as it arrives (fetches, LLM streams, chunks):
  • Real-time doc updates as data comes in (e.g., person search β†’ update profile as each source fetched)
  • Line numbers + content hashes for addressing ranges, detecting overlap
  • N LLMs propose content β†’ shuffle lines into place β†’ edit/unify in real-time
  • Use case: parallel research streams merge into single evolving document
  • Think: collaborative doc where each source/model contributes lines, system detects redundancy and merges
☐ Live audio analysis for Tesla call - Real-time audio stream analysis for today's Tesla earnings call

Next (2 open)

☐ Try judging pipeline - Test the new --each flag end-to-end:
☐ Try judging pipeline - Test the new --each flag end-to-end:

Features (12 open)

β–Ά

Polish (2 open)

☐ Syntax highlighting theme for YAML (CodeMirror CSS overrides)
☐ Syntax highlighting theme for YAML (CodeMirror CSS overrides)

Explore (2 open)

☐ Collapsible messages in chat - Use Gradio's reasoning_tags or similar for collapsible system prompt display. See https://www.gradio.app/docs/gradio/chatbot#param-chatbot-reasoning-tags and https://www.gradio.app/docs/gradio/chatbot#examples
☐ Collapsible messages in chat - Use Gradio's reasoning_tags or similar for collapsible system prompt display. See https://www.gradio.app/docs/gradio/chatbot#param-chatbot-reasoning-tags and https://www.gradio.app/docs/gradio/chatbot#examples

Completed (0 open)

β–Ά

Ready to Test (8 open)

β–Ά

Auto-Create Ingestion Wisdom (6 open)

β–Ά

Verification Execution Engine (24 open)

β–Ά

Questions to Resolve (6 open)

β–Ά

Implementation (10 open)

β–Ά

Free-Signup Paywall Sites (2 open)

Biotech/pharma news. Free signup gets limited articles. Test URL: https://endpoints.news/roivants-dealmaker-lands-81m-cash-bonus-following-drug-sale-to-roche/
  • Signup flow: email + password β†’ limited free articles
  • Strategy: create account once, persist session cookies, reuse across fetches
  • Ties into Session & Login Management below
☐ endpoints.news
Biotech/pharma news. Free signup gets limited articles. Test URL: https://endpoints.news/roivants-dealmaker-lands-81m-cash-bonus-following-drug-sale-to-roche/
  • Signup flow: email + password β†’ limited free articles
  • Strategy: create account once, persist session cookies, reuse across fetches
  • Ties into Session & Login Management below

Tasks (10 open)

β–Ά

Automation Mode Enhancements (6 open)

β–Ά

Agent Quality (6 open)

β–Ά

Testing (6 open)

β–Ά

Implemented (0 open)

β–Ά

Visual Verification (Priority) (34 open)

β–Ά

Reference Appearance Screenshots (8 open)

β–Ά

Top Priority (3 open)

Scrape and collect all Section 351 ETFs. Research what's involved: tax-free exchange mechanism, which ETFs use it, fund structures, eligible securities, investor requirements. Build a comprehensive dataset of 351 ETFs with their holdings, launch dates, and conversion details.
☐ **Newsflow Buildout
Topic-Driven News Intelligence — Transform newsflow from a monitoring pipeline into a browsable news product. Currently: manual search queries, results buried in job tables, no synthesis. Target: define a topic with a description → auto-generate queries → relevance-score articles → browse in a reader UI → get daily digests. Four phases: (1) topic→queries + LLM scoring, (2) browsable feed UI at newsflow.localhost, (3) daily/weekly digests with notifications, (4) semantic dedup + cross-topic connections. Full plan: docs/plans/2026-02-21-newsflow-buildout.md**
☐ Section 351 ETFs
Scrape and collect all Section 351 ETFs. Research what's involved: tax-free exchange mechanism, which ETFs use it, fund structures, eligible securities, investor requirements. Build a comprehensive dataset of 351 ETFs with their holdings, launch dates, and conversion details.

VIC Cached Content Improvements (2 open)

☐ VIC styling in cached viewer
Static server serves cached VIC HTML but CSS/JS assets don't load (require VIC authentication). Options: (1) Extract description content only, serve in clean wrapper with basic styling, (2) Use VIC cookies to fetch/cache CSS/JS assets, (3) Inline critical styles directly in cached HTML. Current state: content is readable but unstyled. Related: static/server.py asset caching, jobs/data/vic_ideas/.share base_path config.
☐ VIC styling in cached viewer
Static server serves cached VIC HTML but CSS/JS assets don't load (require VIC authentication). Options: (1) Extract description content only, serve in clean wrapper with basic styling, (2) Use VIC cookies to fetch/cache CSS/JS assets, (3) Inline critical styles directly in cached HTML. Current state: content is readable but unstyled. Related: static/server.py asset caching, jobs/data/vic_ideas/.share base_path config.

Dashboard Improvements (2 open)

☐ Paginate items in large jobs
Jobs with 500+ items are slow to load and unwieldy. Add pagination (page size ~50) to Pending/Done/Failed tabs in the detail view, with next/prev controls and item count display.
☐ Paginate items in large jobs
Jobs with 500+ items are slow to load and unwieldy. Add pagination (page size ~50) to Pending/Done/Failed tabs in the detail view, with next/prev controls and item count display.

Runner Improvements (10 open)

β–Ά

Job Event Log (Changelog) (8 open)

β–Ά

Validator Stage Role (10 open)

β–Ά

Success-Rate Circuit Breaker (1 open)

☐ Low success rate CB
Track success/fail ratio over a sliding window (last N items, default 20). Auto-pause when success rate drops below threshold (e.g., success_rate_min: 0.50). Catches intermittent failures that never cluster enough to trip the consecutive CB. Config per-stage in YAML:

Validation Circuit Breaker (6 open)

β–Ά

Repair Workflow (6 open)

β–Ά

New Job Ideas (6 open)

β–Ά

Investment Research (2 open)

☐ Cheap power / US solar production
Research investment opportunities in cheap electricity and US-based solar manufacturing. Source: https://youtu.be/BYXbuik3dgA?si=6KqftryUmoChEqQa
☐ Cheap power / US solar production
Research investment opportunities in cheap electricity and US-based solar manufacturing. Source: https://youtu.be/BYXbuik3dgA?si=6KqftryUmoChEqQa

New Sources (2 open)

☐ **Local & Municipal Data
LLM-based Scrape**:
  • Goal: Extract structured data from local/municipal government sites (permits, zoning, property records, council minutes, budgets, public notices).
  • Why: Municipal data is high-value but poorly structured β€” PDFs, inconsistent HTML, no APIs. LLM extraction can normalize it into queryable knowledge.
  • Approach: Browser automation (rivus/browser) + LLM extraction (brain/extract). Same pipeline as VIC/supplychain but pointed at gov sites.
  • Examples: Building permits, zoning changes, city council agendas, public budget documents, property assessment records.
☐ **Local & Municipal Data
LLM-based Scrape**:
  • Goal: Extract structured data from local/municipal government sites (permits, zoning, property records, council minutes, budgets, public notices).
  • Why: Municipal data is high-value but poorly structured β€” PDFs, inconsistent HTML, no APIs. LLM extraction can normalize it into queryable knowledge.
  • Approach: Browser automation (rivus/browser) + LLM extraction (brain/extract). Same pipeline as VIC/supplychain but pointed at gov sites.
  • Examples: Building permits, zoning changes, city council agendas, public budget documents, property assessment records.

Cost Control (2 open)

☐ Multiple Max accounts in envs
rotate/split API usage across accounts
☐ Multiple Max accounts in envs
rotate/split API usage across accounts

Measure & Validate (2 open)

☐ Measure initial-only variant value: Does "T. Lastname" find any unique URLs that "Timothy Lastname" and "Tim Lastname" don't? Run 5-10 names, compare candidate URLs per variant. If initial-only never adds unique results, drop it to save Serper credits.
☐ Measure initial-only variant value: Does "T. Lastname" find any unique URLs that "Timothy Lastname" and "Tim Lastname" don't? Run 5-10 names, compare candidate URLs per variant. If initial-only never adds unique results, drop it to save Serper credits.

Future Phases (16 open)

β–Ά

Data Sources (4 open)

thorough list of semiconductor/supply chain publications
  • Rank by quality
  • Note cost vs free access
  • Categories: news, research, analyst reports, trade journals
  • Examples to evaluate: SemiEngineering, EETimes, DigiTimes, Semiconductor Digest, SEMI reports, TrendForce, IC Insights, Yole, etc.
☐ Industry publications list
thorough list of semiconductor/supply chain publications
  • Rank by quality
  • Note cost vs free access
  • Categories: news, research, analyst reports, trade journals
  • Examples to evaluate: SemiEngineering, EETimes, DigiTimes, Semiconductor Digest, SEMI reports, TrendForce, IC Insights, Yole, etc.
what's available only via subscription/enterprise
  • Capital IQ (S&P) β€” supply chain relationships, financials, private company data
  • Refinitiv/LSEG β€” supply chain data, ownership, estimates
  • Bloomberg Terminal β€” supply chain module (SPLC)
  • FactSet β€” supply chain relationships
  • Pitchbook β€” private company valuations
  • Gartner/IDC β€” market share reports
  • SEMI β€” industry reports, fab capacity data
  • Evaluate: coverage, cost tiers, API access, data freshness
☐ Paid data sources research
what's available only via subscription/enterprise
  • Capital IQ (S&P) β€” supply chain relationships, financials, private company data
  • Refinitiv/LSEG β€” supply chain data, ownership, estimates
  • Bloomberg Terminal β€” supply chain module (SPLC)
  • FactSet β€” supply chain relationships
  • Pitchbook β€” private company valuations
  • Gartner/IDC β€” market share reports
  • SEMI β€” industry reports, fab capacity data
  • Evaluate: coverage, cost tiers, API access, data freshness

Data Quality (6 open)

β–Ά

Viewer Improvements (2 open)

☐ Add market cap data for seed companies (via finnhub or discover.py)
☐ Add market cap data for seed companies (via finnhub or discover.py)

Transcript Analysis (6 open)

β–Ά

Dash Explorer (6 open)

β–Ά

Data Pipeline (8 open)

β–Ά

High Priority (8 open)

β–Ά

Medium Priority (6 open)

β–Ά

Low Priority (6 open)

β–Ά

Translation (4 open)

☐ Real-time WebSocket translation
Use OpenAI Realtime API for streaming transcription/translation during video playback. Would show live subtitles as video plays.
☐ Real-time WebSocket translation
Use OpenAI Realtime API for streaming transcription/translation during video playback. Would show live subtitles as video plays.
OCR on-screen text (signs, subtitles burned into video) and translate. Could use Tesseract or cloud vision APIs.
☐ Screen text translation
OCR on-screen text (signs, subtitles burned into video) and translate. Could use Tesseract or cloud vision APIs.

Autonomy: Pick TODOs from Sessions (1 open)

☐ Session-driven TODO discovery
Review today's sessions, extract mentioned items that aren't captured or prioritized in TODO.md. Could be bugs spotted, features discussed, ideas floated. Run as a periodic sweep or on-demand.

Jobs: Per-Resource Pacing (Option 2) (1 open)

☐ Pacing per-resource, not per-job
Only fetch stages need rate limiting (residential proxy). Extract/check_enrich read cached data + call Gemini Flash. Runner should skip pacing when reprocessing stages that don't use a rate-limited resource. See jobs/CLAUDE.md "Resource contention" section.

VIC: Paywall Items Need Re-fetch (1 open)

☐ Mark paywalled items for re-fetch
~1,172 items were fetched when account couldn't see content (cached HTML is the paywall page). Mark these with needs_refetch=true in DB, reset fetch stage to pending so they get re-fetched with current credentials. Extract should distinguish "parser failed on good HTML" from "HTML is the paywall page."

Review & Reduce AWS Spending (1 open)

☐ Audit Amazon AWS infra
Review all running services, identify what can be deleted or backed up to reduce costs. Check EC2, S3, Lambda, RDS, etc.

Reduce: Image Iteration with Auto-Refinement (1 open)

☐ Implement reduce image iteration
Extend brain/reduce/ with image modality support, auto-refinement loop, and cost tracking. Design: docs/plans/2026-02-17-reduce-image-iteration-design.md. Key pieces:
  • [ ] refine.py β€” Prompt expansion (vagueβ†’concrete), auto-refine (critiqueβ†’improved prompt), steer (user feedbackβ†’rewrite)
  • [ ] task.py β€” Add modality, prompt_history, costs, artifact_path, prompt_version, gen_cost fields
  • [ ] gen.py β€” Image generation path via lib/llm/image_gen when modality: image
  • [ ] app.py β€” Image gallery UI, cost display (per-candidate / session / daily), steer input, auto-loop controls
  • [ ] score.py β€” Read images from artifact_path for scoring (base64 path already exists)

Learning System (2 open)

☐ Wire up principle application tracking
Effectiveness tab is inert (1 manual entry). Session review should detect when a principle was relevant to a session outcome and auto-call record_application(). Two paths: (1) post-session analysis matches applied patterns to principles, (2) real-time detection during sessions when a principle-aligned action succeeds/fails. Without this, the effectiveness feedback loop never closes.
☐ learn find optimization
Semantic search loads ALL embeddings into memory for cosine similarity. Fine for ~1K items, needs optimization (ANN index, or SQLite vector extension) when approaching 5K+. Also: evaluate whether Gemini title param adds value over structured prefix approach (A/B test with retrieval benchmark).

Watch / Read (1 open)

☐ **David George (a16z)
State of Markets** β€” AI markets deep dive, Jan 2026. https://www.youtube.com/watch?v=rSohMpT24SI / https://a16z.com/state-of-markets/

Session Intelligence Server (watch.api) (0 open)

β˜‘ Implement watch.api.localhost
Unified session intelligence server (FastAPI, port 8130). Replaces 3 cold-start subprocess workers with one persistent server. Design: docs/plans/2026-02-19-session-intelligence-server-design.md. Key pieces:
  • [x] FastAPI server with /hook/prompt, /hist/{sid}, /state/{sid}, /health endpoints
  • [x] Hierarchical session tree as canonical data model (badge + hist derived from it)
  • [x] Single haiku LLM call per prompt (tree + badge + theme in one shot)
  • [x] Server-side JSONL tail reading for rich context
  • [x] Pre-rendered /hist ASCII output (zero Claude tokens)
  • [x] curl one-liner hook (~/.claude/hooks/watch-api-hook.sh) β€” running alongside existing handler
  • [x] Caddyfile + registry entry for watch.api.localhost
  • [ ] Extend learning/gyms/badge/ for tree + ablation testing
  • [ ] Cutover: remove old subprocess workers after validation period

Servers To Add (3 open)

☐ present server
Gradio UI for present/ (papers, diagrams, demos, blog posts, tweets, slides). Port TBD, present.localhost.
☐ benchmark server
Gradio UI for benchmarks/ (run configs, view results, compare models). Port TBD, bench.localhost.
☐ benchmark LLM backend
Use our hot runner (lib/llm) or litellm --proxy? Hot runner has subscription routing, caching, model aliases; litellm proxy gives OpenAI-compatible endpoint, load balancing, spend tracking. Decide and unify.

#3: Founder Evaluator (`TODO.md:68`) + Bright Data integration (`TODO.md:62`) β€” Impact 9/10, Effort L+M (1 open)

☐ Add MiniMax 2.5 model
Check litellm support for MiniMax-Text-01 / MiniMax 2.5 (456B MoE). Add to lib/llm model aliases and brain/vario config if available.

Ideas to Present 🎀 (2 open)

☐ Shadow model testing
Run candidate models in parallel on live LLM traffic, async eval for improvement opportunities. Primary model serves the result, shadow models are logged and compared. Builds a data-driven case for model swaps ("grok-fast matched haiku 94% of the time at 1/5 the cost"). Gym tests prompt variants on replayed sessions Γ— shadow testing evaluates model variants on live traffic β†’ together they optimize both axes (prompt Γ— model). See lib/llm/TODO.md for implementation sketch.
☐ Session intelligence server
Unified hist/badge/title from one persistent server + hierarchical session tree. Replaces 3 cold-start Python processes with a single curl hook. Observable tuning via gym (session replay + ablation). See docs/plans/2026-02-19-session-intelligence-server-design.md.

Private Data Access Tools β€” `lib/private_data/` (5 open)

β–Ά

Parallelization (critical β€” 10x speedup) (3 open)

☐ Fix experiment.py outer loops
strategies Γ— problems both sequential; should asyncio.gather across problems (with optional semaphore for rate limits)
☐ Fix fn_temperature_sweep
3 temperatures run sequentially in for-loop; use asyncio.gather
☐ Fix fn_lens_ensemble
5 lenses run sequentially in for-loop; use asyncio.gather

Model experiments (3 open)

☐ Run on gemini-3-pro
$2/$12 per 1M tokens, full experiment ~$20. Key question: do strategies still help when baseline is already strong?
☐ Rerun flash with parallelization
previous run killed due to sequential bottleneck + rate limits
☐ Compare across models
same strategies on flash vs pro vs haiku, combined report

Documentation (2 open)

☐ Add brain/strategies to CLAUDE.md
major subsystem (19 strategies, 10 stages, 9 lenses, 16 tags, move library) with zero documentation in project README
☐ Write brain/strategies/README.md
architecture, usage, experiment workflow

Benchmarks β€” Test Vario Strategies (8 open)

β–Ά

Eval (2 open)

☐ **Vario A/B eval
benchmark configs against each other** β€” Send identical requests to different vario configs (e.g., fast vs maxthink vs allthink, or custom strategy combos) and evaluate output quality. Sources: sample requests from real Claude Code sessions, ChatGPT/Gemini chat exports, brain app usage, and browser history (for URL-based queries). Pipeline: (1) curate a request corpus via lib/private_data/ importers (see main TODO.md β€” private data access tools), (2) run each request through N vario configs, (3) LLM-as-judge scores each output against a rubric (clarity, correctness, depth, actionability β€” rubric is per-domain, set up in advance), (4) aggregate scores per config, identify which configs win on which request types. Store results in experiments.db alongside regular runs. Goal: data-driven answer to "does vario complexity pay off?" and "which config for which task type?" Related: TODO.md:175 (brain benefit vs vanilla).
☐ Human eval UI for vario outputs
Gradio interface for human review of vario results. Show outputs side-by-side (blinded or labeled), let reviewer score on rubric dimensions, add free-text notes, flag interesting outputs. Features: (1) pull runs from experiments.db, (2) present as review queue (unreviewed first), (3) human scores stored alongside LLM judge scores for calibration, (4) agreement metrics between human and LLM judge (track where they diverge β€” that's where the rubric needs work), (5) export reviewed examples as few-shot calibration data for the LLM judge. This closes the loop: LLM judge does volume, human eval keeps it honest.

TODO: Fetchability Matrix Validation (LLM URL Tool Input) (3 open)

☐ Maintain fetchability contract for mode=auto|js|unlocker and escalation evidence
  • Gym spec: learning/gyms/fetchability/docs/FETCHABILITY_MATRIX_SPEC.md
  • Machine-readable matrix: learning/gyms/fetchability/tests/fixtures/fetchability_matrix.yaml
  • Parameterized tests: learning/gyms/fetchability/tests/test_fetchability_matrix.py
☐ Run live matrix probes with real paid URLs (Substack + Patreon) and record required means
  • Required env: BROWSER_TEST_SUBSTACK_PAID_URL, BROWSER_TEST_PATREON_PAID_URL
  • Optional auth flags: BROWSER_TEST_SUBSTACK_PAID_AUTH=1, BROWSER_TEST_PATREON_PAID_AUTH=1
☐ Capture baseline latency/cost for each first-success mode before wiring into lib/llm/tools.py

The Plan: Make It Go Up (6 open)

β–Ά

Dev / Debug UX (1 open)

☐ Stage timing in dossier view
Show how long each assessment stage took to run (wall-clock time). Useful in a debug/dev view to spot slow stages, compare across entities, and identify optimization targets. Record timestamps in stage result JSON, surface in data viewer.

Assessment Stages (Person Dossier) (4 open)

☐ academic_prowess
Publication record, h-index, citation velocity (rising/declining), star co-authors, patents filed. Sources: OpenAlex API (api.openalex.org/authors?search=X, 90M authors, free), Semantic Scholar API (api.semanticscholar.org, 220M papers, free), ORCID, arXiv, DBLP, PatentsView for inventor lookup. Key: h-index alone is misleading β€” compare to field median; citation velocity and co-author network quality matter more.
☐ engineering_prowess
GitHub stats (stars, repos, contribution consistency, PR reviews given), package maintainership (npm/PyPI downstream dependents), language breadth, Stack Overflow reputation. Sources: GitHub REST/GraphQL API (5K req/hr with token), GH Archive (BigQuery, every public event since 2011), npm/PyPI registries, Stack Overflow API. Key: stars are noisy β€” look at contribution consistency, PR review activity, and maintained packages with real downstream users.
☐ publication_footprint
Books authored, articles/op-eds in major publications, newsletters (Substack/Medium), white papers. Sources: Google Books API (googleapis.com/books/v1/volumes?q=inauthor:X, 40M books, free), Open Library API, Serper site: searches for WSJ/HBR/Forbes/etc. Key: distinguish publication tier β€” op-ed in WSJ vs self-published blog = very different signal.
☐ media_footprint
Podcast appearances, conference keynotes, news mentions, Twitter/X following, YouTube presence. Sources: Listen Notes API (listennotes.com/api/v2/search?q=X&type=episode, 5M+ episodes, free tier 300/mo), Serper news, YouTube search via yt-dlp. Key: measures narrative control β€” does the person own their story (own blog/newsletter/podcast) or only appear when others write about them?

Phase 1: Error Classifier + Circuit Breaker βœ… DONE (0 open)

β–Ά

Phase 2: Per-Stage Pause (2 open)

☐ Per-stage pause instead of whole-job pause
When doctor pauses for a stage error (e.g. extract has a code bug), only that stage should stop. Items already past the broken stage (in score, check_enrich) should continue processing. Currently set_job_paused() is job-level β€” all stage workers stop.
  • Normal stage errors: pause only the broken stage. Items in downstream stages keep going.
  • Validator stage errors (e.g. check_enrich finds upstream extract is broken): pause the whole job β€” the problem is upstream, not just this stage.
  • Stage role: validator in YAML should signal this behavior.
  • Implementation: stage_paused dict in runner memory (lost on restart, but restart re-evaluates anyway). Or job_stage_state table in DB for persistence.
  • Dashboard: show per-stage pause status, not just job-level.
☐ Tiered auto-repair timing
Risk tiers exist but timing isn't enforced yet:
  • low β†’ act immediately (done)
  • medium β†’ Pushover notify β†’ wait 10 min β†’ act (TODO: delay mechanism)
  • high β†’ Pushover notify β†’ wait for business hours 9am-6pm PT (TODO: schedule check)

Phase 3: Deeper Intelligence (3 open)

☐ Version-aware error tracking
Record handler_version alongside each error. Enables: "all failures from handler v.abc123 deployed 2h ago" β†’ code bug vs "same version worked yesterday, errors at 3am" β†’ external issue.
☐ Success-rate circuit breaker
Track success/fail ratio over sliding window (last N items). Auto-pause below threshold.
☐ Batch diagnosis
doctor.diagnose_group(errors) β€” when multiple errors accumulate, look at them together: "5 errors: 4 are VIC cookie (same fingerprint), 1 timeout β†’ root cause is auth."

Phase 4: Integration & Dashboard (4 open)

☐ doctor.watch surfaces job errors
Job runner errors flow through doctor.watch pipeline: fingerprinted, deduplicated, LLM-analyzed, with f to fix and s to silence.
☐ jobs/diagnose.py becomes thin wrapper
Delegates to doctor.classify_error(). Regex stays as offline-only fallback.
☐ Dashboard: doctor actions timeline
Show what doctor did, when, why, outcome. Filter by job, risk tier, action type.
☐ Dashboard: error classification health
Per-job error breakdown by classification. "vic_ideas: 12 systemic, 0 transient, 3 item_specific" at a glance.

Learning Core ↔ Skillz ↔ CC Skills Unification (6 open)

β–Ά

Hybrid Retrieval for `learn find` (5 open)

β–Ά

Skill Consolidation: `/learn`, `/reflect`, `/recall` (4 open)

☐ Rename skill files: learning β†’ learn, apply-learnings β†’ recall
☐ Create /reflect skill
☐ Update howto/skill-triggers.md
☐ Keep old names as aliases during transition

Principle Category Cleanup (3 open)

☐ Add learn rename CLI command (update slug + file in DB, regenerate materialized .md)
☐ Migrate development/internalize-the-verification-loop β†’ dev/
☐ Migrate development/commit-by-logical-intent β†’ dev/

Layout Review Skill (3 open)

☐ Create ~/.claude/skills/layout-review/
workflow: screenshot β†’ identify issues β†’ propose fixes β†’ implement β†’ verify with before/after screenshots
☐ Should reference gradio-layout skill for CSS gotchas
☐ Interface with explorations/gradio_layout_gym/compliance.py for pre-checks

Gemini 3.1 Pro (`gemini-3.1-pro-preview`) β€” 2026-02-19 (3 open)

☐ Run /model-update to swap gemini alias to 3.1 Pro
☐ Re-test subscription route periodically (for flat-rate billing)
☐ Check if Gemini 3.1 Flash is announced

GPT-5.3-Codex (`gpt-5.3-codex`) β€” subscription works, standard API pending (3 open)

☐ Check OpenAI API changelog for gpt-5.3-codex standard API availability
☐ Consider adding codex-spark alias when Spark gets API access
☐ Evaluate upgrading to ChatGPT Pro subscription
needed for gpt-5.3-codex-spark (real-time coding model, currently Pro-only)

Grok Code v2 (multimodal + parallel tools) β€” in training (1 open)

☐ Watch xAI release notes for grok-code-fast-2 or similar

Grok 4.1 Fast (non-reasoning) β€” Use More Aggressively (5 open)

β–Ά

Shadow Model Testing & Async Eval 🎀 (present this!) (6 open)

β–Ά

Autonomous Work Protocol (5 open)

β–Ά

πŸ“Š timdata

To Do (4 open)

☐ Become AI advisor to 10110: Explore advisory/consulting relationship with 10110 on AI strategy.
☐ Create a Family Data MCP Server:
  • Goal: Build a Model Context Protocol (MCP) server to act as a relay for family information.
  • Features:
  • Authentication: Implement secure authentication (likely OAuth 2.0 for Google services).
  • Relay: Provide requested information to the LLM.
  • Data Filling: If information is missing, make a note for the user to provide it later to fill in the blanks and store it in this repository.
  • Integrations: Hook up to Google Calendar (investigate existing MCP servers or build custom using Google Workspace APIs).
☐ Food Automation:
  • Goal: Check whether food ordering can be automated through an API with an LLM assistant.
  • APIs to Check: Uber Eats API, DoorDash Drive API.
☐ Home Monitoring Automation:
  • Goal: Investigate if the Ring camera can be accessed via API (likely unofficial/community-maintained like ring-mqtt or Node-based wrappers).
  • Use Case: Detect and track Tara's morning walks for better routine management.