Generated: 2026-02-22 08:24:32
Regenerate report:
python tools/todos/generate_report.py
Enrich todos with LLM metadata (priority/difficulty/utility):
python tools/todos/enrich.py # Run enrichment python tools/todos/enrich.py --dry-run # Preview only
Impact analysis (multi-model consensus on highest-impact items):
python tools/todos/analyze.py # maxthink preset (Opus, GPT-Pro, Grok, Gemini) python tools/todos/analyze.py -c fast # Fast/cheap models python tools/todos/analyze.py --top 5 # Top 5 instead of 3
After enrichment: Report will show colored priority badges (P1-P5), difficulty levels, and utility descriptions.
Open file in editor: Click file location to select, copy (Cmd+C), then Cmd+P in VS Code and paste.
replication/ β Acquire content, extract structured thesis elements, build company timelines, operationalize into scoring. Three targets: Reeves/Infuse (Substack + letters), Druckenmiller (interviews + 13F), Tepper (interviews + 13F). Start with Reeves β most written content, existing principles in research/infuse_principles.md.covenants/ β Extract structured covenants from EDGAR indentures, compute headroom, track amendments. EBITDA definition resolution is the hard part.watchlist table (names, theses, broad themes)brightdata SSE in rivus project config). Also integrate with browser service.search_engine, scrape_as_markdown (free tier, 5K req/mo), LinkedIn profiles/companies/jobs/posts, Crunchbase, browser automation (Pro mode)BRIGHTDATA_PRIMARY_PROXY, BRIGHTDATA_WEBUNLOCKER_PROXY, BRIGHTDATA_BROWSER_PROXY env vars.projects/people/ + projects/vc_intel/projects/people/README.mdprojects/skillz/domains/finance/ β Find great public examples of AI doing stock analysis, startup evaluation, founder/CEO assessment. Stock analyst is the juiciest domain. Survey GitHub repos, prompt chains, VC tooling landscape. These become training data / inspiration for skill acquisition. See domains/finance/README.md and domains/finance/benchmarks.md.brain/ extraction and kb/ knowledge accumulation.intel/ (was projects/) β Founder evaluator, VC network mapping, company analysis. People profile UI is live, pipeline works end-to-end. Next: demo-ready founder eval with scoring, company enrichment pipeline, Gil/a16z demo prep. Direct revenue potential.kb/ β Knowledge base extraction gym. Web β structured knowledge with scoring. The bridge between brain/extract (fetch+analyze) and actionable intelligence. Needs: extraction quality scoring, knowledge graph accumulation, dedup/merge across sources.projects/skillz/ β Skill acquisition for TFTF companies, finance, people domains. Building domain expertise systematically β what do we need to know, what do we know, what's the gap. Feeds into intel (better founder eval), trading (better analysis), and autonomy (better self-direction).autonomy/ β The system works on goals proactively, not just when asked. Current state: observeβlearnβimproveβverify loop is sketched (autonomy/CLAUDE.md) with doctor, learning, principles, sandbox pieces active but nearly isolated β feedback loops between them aren't connected. Missing pieces: monitor (runtime drift detection), planner (proactive task selection + execution), and the wiring between systems. Progression: (1) Report β "here are top TODOs, here's my plan" β
current, (2) Propose β "I'd like to work on X" with plan, (3) Execute β fork session, do work, present results, (4) Chain β complete one TODO, pick the next, keep going. Key enablers: Situations DB (broader pattern capture), sidekick auto-learnings (autonomous observation), session analysis (understanding what happened). See autonomy/CLAUDE.md for full vision.brain/review/ β 5 models (grok-code, opus, gemini-pro, codex, minimax) Γ 3 dimensions (correctness, style, performance) in parallel. Two-phase: plan (scan files, query learning.db for relevant principles, estimate tokens) β execute (vario parallel calls) β synthesize (cross-model confidence, severity categorization, learning feedback loop). New @file/@dir data source type in brain/engine for general use. Subscription-first billing with API fallback. --dry-run shows plan without executing. Learnings enrich all review prompts and findings feed back into learning.db. Design doc**: docs/plans/2026-02-13-code-review-design.mdbrain/vario/ β Make vario a more powerful thinking tool, not just "run same prompt N ways." Current vario runs parallel models on the same prompt β useful for comparison but doesn't leverage the breadth for deeper analysis. Generation + critique pattern: (1) Generate diverse approaches/solutions/framings across models, (2) Each model critiques other models' outputs (cross-evaluation), (3) Synthesize β identify convergence points, unique insights, and genuine disagreements. "Help me think through" mode (from vario TODO): causal reasoning, second-order effects, competing forces analysis. "Help me prioritize" mode: evaluate options against criteria, score and rank. This transforms vario from "compare outputs" to "collaborative multi-model reasoning" β the kind of thinking that's genuinely better with multiple perspectives. Related: vario/TODO.md streaming coalesce, judging pipeline.brightdata SSE in rivus project config). Also integrate with browser service.search_engine, scrape_as_markdown (free tier, 5K req/mo), LinkedIn profiles/companies/jobs/posts, Crunchbase, browser automation (Pro mode)BRIGHTDATA_PRIMARY_PROXY, BRIGHTDATA_WEBUNLOCKER_PROXY, BRIGHTDATA_BROWSER_PROXY env vars.intel/people/ + projects/vc_intel/intel/people/README.mdprojects/skillz/domains/finance/ β Find great public examples of AI doing stock analysis, startup evaluation, founder/CEO assessment. Stock analyst is the juiciest domain. Survey GitHub repos, prompt chains, VC tooling landscape. These become training data / inspiration for skill acquisition. See domains/finance/README.md and domains/finance/benchmarks.md.brain/ extraction and kb/ knowledge accumulation.session_index.dbrg), chronicle feed, learning extractionsupervisor/sidekick/hooks/handler.py (SessionEnd event)projects/skillz/domains/companies/ β Separate project from VC/founder tool. Identify "Too Fast To Follow" public companies via SEC filings, earnings calls, patent velocity, product launch cadence. Score on: velocity, compounding, moat depth, talent magnetism, capital efficiency, founder intensity. All free public data sources (10K/10Q, USPTO, GitHub, press releases). See domains/companies/README.md.brain/GOALS.md for UI mockup, tasks/design/newsflow_macro.md for full spec.tasks/design/supervisor_design_task.md). Heartbeat, watchdog, cron, session supervision, sidekick.*.jott.ninja. Smoother login (no email wait), supports different session durations per user (owner = 30 days, others = 24h). Configure in Cloudflare Zero Trust dashboard β Settings β Authentication β Add Google as identity provider. Requires Google Cloud OAuth client ID + secret.kb/wisdom/ β Extract + analyze investment thesis logic from Value Investors Club. See kb/wisdom/README.md.rg), chronicle feed, learning extractionjournal-summary.py hook runs on SessionEnd but only records stats (duration, tool counts). Stats are all zeros currently β journal.db registration has a regression. session_analysis table exists but is never populated.supervisor/sidekick/hooks/handler.py or extend journal-summary.pyprojects/skillz/domains/companies/ β Separate project from VC/founder tool. Identify "Too Fast To Follow" public companies via SEC filings, earnings calls, patent velocity, product launch cadence. Score on: velocity, compounding, moat depth, talent magnetism, capital efficiency, founder intensity. All free public data sources (10K/10Q, USPTO, GitHub, press releases). See domains/companies/README.md.brain/GOALS.md for UI mockup, tasks/design/newsflow_macro.md for full spec.tasks/design/supervisor_design_task.md). Heartbeat, watchdog, cron, session supervision, sidekick.*.jott.ninja.kb/wisdom/ β Extract + analyze investment thesis logic from Value Investors Club. See kb/wisdom/README.md.admin.jott.ninja to Caddyfile. Move sensitive content (billing links, API key references, cost tracking) from watch dashboard to dedicated admin page. Eventually: live cost polling from Anthropic/OpenAI/BD APIs.cloudflare provider to lib/tempmail/. Unblocks VIC multi-account signup automation β all free temp email domains (virgilian.com, guerrillamail.com, etc.) are on disposable blocklists. VIC silently accepts signup but never sends the welcome email. Custom domain is the only bulletproof fix. projects/vic/signup.py (BD Scraping Browser + Turnstile), lib/tempmail/ (3 providers + domain reputation checker).jobs/handlers/nonprofit_990s.py β
OCR completegemini/gemini-3-flash-preview at 150 DPI, 1 page/batch (streaming, repetition-safe)cloudflare provider to lib/tempmail/. Unblocks VIC multi-account signup automation β all free temp email domains (virgilian.com, guerrillamail.com, etc.) are on disposable blocklists. VIC silently accepts signup but never sends the welcome email. Custom domain is the only bulletproof fix. projects/vic/signup.py (BD Scraping Browser + Turnstile), lib/tempmail/ (3 providers + domain reputation checker).jobs/handlers/nonprofit_990s.py β
OCR completegemini/gemini-3-flash-preview at 150 DPI, 1 page/batch (streaming, repetition-safe)supervisor/ β Run PYTHONPATH=. python -m supervisor.cli run --dry-run -v to watch it detect waiting sessions, then test live on a forked session with plan approval or AskUserQuestion. Session ID ed537830-011b-4aff-9502-90570e9a83b3 (this build session).learning/session_review/ β Multi-candidate approach implemented (Feb 2026):pair_judge.py) evaluates all candidates and picks the actual repairprinciple_propose.py filters to pair_verdict='repair' automaticallyMETHODOLOGY.md for full documentationlearning/session_review/ β Full pipeline: failure_mining.py, failure_browser.py, principle_propose.py, judge caching, METHODOLOGY.md. See METHODOLOGY.md for approach.doctor/critique.py β Run inv doctor.critique -p watch (or -p brain). Verify: agent navigates, takes screenshots at multiple viewports, reports findings, generates HTML report with embedded screenshots. Check report quality and actionability of findings. Try --model opus for deeper critique. Output lands in {project}/.doctor/critiques/{timestamp}/.learning/ β Run session review pipeline end-to-end: python -m learning.session_review.failure_mining, check failure_browser.py UI, verify principle_propose.py extracts actionable principles from failureβrepair pairs. Check learning.db has recent entries, python -m learning.cli list shows learnings. Verify materialization to learnings.md works: python learning/schema/materialize.py.supervisor/ β Run PYTHONPATH=. python -m supervisor.cli run --dry-run -v to watch it detect waiting sessions, then test live on a forked session with plan approval or AskUserQuestion. Session ID ed537830-011b-4aff-9502-90570e9a83b3 (this build session).learning/session_review/ β Multi-candidate approach implemented (Feb 2026):pair_judge.py) evaluates all candidates and picks the actual repairprinciple_propose.py filters to pair_verdict='repair' automaticallyMETHODOLOGY.md for full documentationlearning/session_review/ β Full pipeline: failure_mining.py, failure_browser.py, principle_propose.py, judge caching, METHODOLOGY.md. See METHODOLOGY.md for approach.~/all-code/investor/replication/ β Extract analytical frameworks from top investors (Reeves/Infuse, Druckenmiller, Tepper) by acquiring their content (Substack, interviews, letters, 13F), extracting structured thesis elements per document, building company timelines, and operationalizing into scoring/screening. Design task: tasks/design/investment_philosophy_extraction.md~/all-code/investor/covenants/ β Extract structured covenants from EDGAR indentures/credit agreements, compute headroom vs current financials, track amendments over time. Key challenge: resolving nested EBITDA definitions and cross-references.~/all-code/investor/replication/ β Extract analytical frameworks from top investors (Reeves/Infuse, Druckenmiller, Tepper) by acquiring their content (Substack, interviews, letters, 13F), extracting structured thesis elements per document, building company timelines, and operationalizing into scoring/screening. Design task: tasks/design/investment_philosophy_extraction.md~/all-code/investor/covenants/ β Extract structured covenants from EDGAR indentures/credit agreements, compute headroom vs current financials, track amendments over time. Key challenge: resolving nested EBITDA definitions and cross-references.supervisor/sidekick/ + learning/ β Sidekick should observe session activity and auto-generate learnings asynchronously. When it detects patterns (repeated failures, successful fixes, new conventions established), create learning entries in learning.db without blocking the session. Hook into UserPromptSubmit or PostToolUse events. Lightweight LLM call (flash/haiku) to classify whether the current turn contains a learning-worthy observation.learning/ β Build the dataset first. Mine sessions for situation records: context + problem + resolution + outcome. Broader than tool errors β capture slow paths, workarounds, design decisions, architectural choices. This DB is the foundation for everything else (principle extraction, wisdom, pattern matching to new situations). Start simple: SQLite, one table, mine from JSONL transcripts.supervisor/sidekick/ + learning/ β Sidekick should observe session activity and auto-generate learnings asynchronously. When it detects patterns (repeated failures, successful fixes, new conventions established), create learning entries in learning.db without blocking the session. Hook into UserPromptSubmit or PostToolUse events. Lightweight LLM call (flash/haiku) to classify whether the current turn contains a learning-worthy observation.learning/ β Build the dataset first. Mine sessions for situation records: context + problem + resolution + outcome. Broader than tool errors β capture slow paths, workarounds, design decisions, architectural choices. This DB is the foundation for everything else (principle extraction, wisdom, pattern matching to new situations). Start simple: SQLite, one table, mine from JSONL transcripts.learning/ β Each learning should have an importance/weight field (1-5 or similar). High-importance learnings get prioritized in materialization to learnings.md and principles extraction. Low-importance ones stay in DB but don't consume context window real estate. Could be auto-rated by the LLM during learn classification, or manually set.learning/ β Each learning should have an importance/weight field (1-5 or similar). High-importance learnings get prioritized in materialization to learnings.md and principles extraction. Low-importance ones stay in DB but don't consume context window real estate. Could be auto-rated by the LLM during learn classification, or manually set.~/.claude/skills/principles/SKILL.md. Feeds into ~/.claude/principles/, project conventions, ~/.claude/howto/.principle_propose.py (existing but limited to tool errors)doctor/chronicle/. Currently only tracks code commits and session topics.~/.claude/skills/principles/SKILL.md. Feeds into ~/.claude/principles/, project conventions, ~/.claude/howto/.principle_propose.py (existing but limited to tool errors)doctor/chronicle/. Currently only tracks code commits and session topics.projects/people/ or projects/companies/.kb/ or investor/. NLP feature extraction + backtesting against price data.intel/people/ or intel/companies/.kb/ or investor/. NLP feature extraction + backtesting against price data.lib/llm/tools.py β LLM can fetch URLs from search results. Needs: brain's fetch_escalate with smart proxy escalation, BrightData unlocker/JS rendering for paywalled/dynamic content. High-volume JS fetching may need existing browser service or BrightData Browser CDP endpoint. Design considerations: mode param (auto/js/unlocker), rate limiting, content truncation for token efficiency.lib/llm/tools.py β LLM can fetch URLs from search results. Needs: brain's fetch_escalate with smart proxy escalation, BrightData unlocker/JS rendering for paywalled/dynamic content. High-volume JS fetching may need existing browser service or BrightData Browser CDP endpoint. Design considerations: mode param (auto/js/unlocker), rate limiting, content truncation for token efficiency.kb/, formerly knowledge_gym)
python -m kb.scenario -s URL -n 3kb/corpus/knowledge.jsonlexplorations/problem_solving_gym/)
python brain/validate_extraction.py to verify HTML/YouTube/PDF extraction
python -m kb.scenario -s URL -n 3kb/corpus/knowledge.jsonlexplorations/problem_solving_gym/)
python brain/validate_extraction.py to verify HTML/YouTube/PDF extraction
gr.themes.Glass(), gr.themes.Ocean(), gr.themes.Citrus()gr.themes.Glass(), gr.themes.Ocean(), gr.themes.Citrus()admin.jott.ninja and any future admin-only subdomains. Ensure the existing "allow-friends" policy does NOT grant access to admin-only paths. Move sensitive content (billing, API keys, cost tracking) from watch dashboard to dedicated admin page. Eventually: live cost polling from Anthropic/OpenAI/BD APIs.voice.jott.ninja or under admin.jott.ninja) that lets you talk to a running Claude Code session on the laptop via voice. Admin-only access (tchklovski@gmail.com).finance/earnings/live/transcribe_gpt_realtime.py β OpenAI Realtime API via WebSocket (GPT-4o-transcribe, VAD, 24kHz PCM, sub-second latency). Reuse the WebSocket/audio capture patterns.it2api send-text or supervisor) β response β TTS β audio back to browsercloudflared on mcp-server-box (GCP) so it can serve *.jott.ninja subdomains alongside the laptop tunnel. Same Cloudflare Access (Google OAuth) protects everything. Two use cases:cloudflared on mcp-server-box, (2) authenticate to existing rivus tunnel, (3) add ingress rules for GCP-served hostnames in tunnel config, (4) deploy Gradio app container, (5) add DNS route cloudflared tunnel route dns rivus .jott.ninja .ai/docs/mcp-box-setup.md for box details, infra/cloudflared.yml for current tunnel config.writing/ or design/writing/ in rivus?
explorations/gradio_layout_gym/, branch learn-gradioexplorations/gradio_layout/EXPERIMENTS.md has 6 real experiments (viewport locking, HTML overflow, flex CSS). Gym concept in gradio_layout_gym/README.md (never built).learn-gradio β accumulates experiments without polluting main. Lessons graduate to ~/.claude/howto/gradio.md and gradio-layout skillkb/ has basic extractβscoreβcorpus loopexplorations/gradio_layout_gym/, branch learn-gradioexplorations/gradio_layout/EXPERIMENTS.md has 6 real experiments (viewport locking, HTML overflow, flex CSS). Gym concept in gradio_layout_gym/README.md (never built).learn-gradio β accumulates experiments without polluting main. Lessons graduate to ~/.claude/howto/gradio.md and gradio-layout skillkb/ has basic extractβscoreβcorpus loopbrowser search β fetch top result
brain search "query" -n 5 β fetch top N results in parallel
brain "silver down 8%, who's affected"brain "Apple earnings, extract causal claims"brain "summarize https://example.com"brain "silver down 8%, who's affected"brain "Apple earnings, extract causal claims"brain "summarize https://example.com"person_integrity precursor into structured prompt
person_expertise_real precursor
person_red_flags precursor
person_integrity precursor into structured prompt
research_* in query_precursors.yaml)
?prompt=... - set the prompt text?config=name - select a config preset?input=... or ?input_url=... - set input content?autorun=1 - auto-run on load?output=/path/to/file.yaml param--output path for filebrain get URL | vario gen "..." --output results.yaml?url=...&config=..., extend to:
?autorun=1 - auto-run extraction on load?output=path - specify output destination?output=/path/to/file.yaml parambrain extract URL --config causal --output results.yaml?prompt=... - set the prompt text?config=name - select a config preset?input=... or ?input_url=... - set input content?autorun=1 - auto-run on load?output=/path/to/file.yaml param--output path for filebrain get URL | vario gen "..." --output results.yaml?autorun=1 - auto-run extraction on load?output=path - specify output destination?output=/path/to/file.yaml parambrain extract URL --config causal --output results.yaml--each flag end-to-end:
rank_feasibilityvario prompt "rough request" commandgr.BrowserState to persist recent configs in localStorage. Show as clickable items alongside presets. Keep last ~10, dedupe.
--search tool|native|none, --directive ignore|async|sync?search=tool|native|none and ?directive=ignore|async|sync URL params# directive: nonerank_feasibilityvario prompt "rough request" commandgr.BrowserState to persist recent configs in localStorage. Show as clickable items alongside presets. Keep last ~10, dedupe.
--search tool|native|none, --directive ignore|async|sync?search=tool|native|none and ?directive=ignore|async|sync URL params# directive: nonereasoning_tags or similar for collapsible system prompt display. See https://www.gradio.app/docs/gradio/chatbot#param-chatbot-reasoning-tags and https://www.gradio.app/docs/gradio/chatbot#examples
reasoning_tags or similar for collapsible system prompt display. See https://www.gradio.app/docs/gradio/chatbot#param-chatbot-reasoning-tags and https://www.gradio.app/docs/gradio/chatbot#examples
.all(), .count(), etc.)
page.evaluate() examples (# vs //, len vs .length)
verify/models.py - VerifySpec, CheckResult, etc.
verify/actions.py - click, fill, wait implementations
verify/assertions.py - visible, in_viewport, text_contains
verify/capture.py - screenshot, DOM, aria capture
verify/executor.py - main runner with parallel support
browser/site_extractors.yaml or brain/?
browser extract URL --save to save working rule
https://endpoints.news/roivants-dealmaker-lands-81m-cash-bonus-following-drug-sale-to-roche/https://endpoints.news/roivants-dealmaker-lands-81m-cash-bonus-following-drug-sale-to-roche/--profile PATH to use existing browser profile directly
inv doctor.tail - Log error detection + LLM analysis
inv doctor.expect - Basic LLM expectations (HTTP only)
verify/runner.py - Playwright executor (uses browser/verify)
verify/nl_assist.py - LLM generates spec from description
llm: "footer should be visible")
references.py - Reference management (add/list/delete/show)
expect.py with visual_match expectation type
docs/plans/2026-02-21-newsflow-buildout.md**static/server.py asset caching, jobs/data/vic_ideas/.share base_path config.static/server.py asset caching, jobs/data/vic_ideas/.share base_path config.ResourceRegistry in runner.py. Stages set resource: youtube in YAML, share a single asyncio.Semaphore. Concurrency set in top-level resources: section. Hot-reloadable.retry_later re-queues items immediately with no cooldown, creating tight loops and log spam. The real fix isn't a cooldown β it's scoping retries correctly: item-level (skip this one), resource-level (back off the API), or job-level (pause). Most current retry_later uses are either "should fail" (no URL found) or "should back off the resource" (429). See resource-aware backoff below.ResourceRegistry in runner.py. Stages set resource: youtube in YAML, share a single asyncio.Semaphore. Concurrency set in top-level resources: section. Hot-reloadable.raise ResourceThrottledError("finnhub", cooldown_s=60)ResourceRegistry semaphore temporarily blocks (all permits held for cooldown duration)finnhub, ib), per-provider (anthropic, openai), or per-site (wayback)concurrency controls local parallelism; resource controls shared external limitsretry_later re-queues items immediately with no cooldown, creating tight loops and log spam. The real fix isn't a cooldown β it's scoping retries correctly: item-level (skip this one), resource-level (back off the API), or job-level (pause). Most current retry_later uses are either "should fail" (no URL found) or "should back off the resource" (429). See resource-aware backoff below.raise ResourceThrottledError("finnhub", cooldown_s=60)ResourceRegistry semaphore temporarily blocks (all permits held for cooldown duration)finnhub, ib), per-provider (anthropic, openai), or per-site (wayback)concurrency controls local parallelism; resource controls shared external limitsenabled: false / resource: / stage_deps interact. ASCII or Mermaid in CLAUDE.md.max_priority guard to stop at threshold (phased rollouts)jobs/logs/{job_id}.log with rotationenabled: false / resource: / stage_deps interact. ASCII or Mermaid in CLAUDE.md.enabled: false, preventing items from getting stuck forever.max_priority guard to stop at threshold (phased rollouts)stage_version_hash includes them. Implemented in tracker.py, used by vic_ideas and vic_wayback.jobs/logs/{job_id}.log with rotationenabled: false, preventing items from getting stuck forever.stage_version_hash includes them. Implemented in tracker.py, used by vic_ideas and vic_wayback.(id, job_id, item_key, stage, event_type, event_data JSON, timestamp). Append-only, never updated/deleted. Event types: item_discovered, stage_started, stage_completed, stage_failed, job_paused, job_resumed, handler_updated, cb_tripped, config_reloadedjob_events.(id, job_id, item_key, stage, event_type, event_data JSON, timestamp). Append-only, never updated/deleted. Event types: item_discovered, stage_started, stage_completed, stage_failed, job_paused, job_resumed, handler_updated, cb_tripped, config_reloadedjob_events.role: validator tells the framework this stage validates upstream output. Adds validates: [extract, fetch] to declare what it checks._invalidate_stage: "extract", runner resets that stage to pending for the item. Dashboard shows items with active invalidations._discrepancies: [{field, extract_value, llm_value, severity}]. Stored in results table.role: validator tells the framework this stage validates upstream output. Adds validates: [extract, fetch] to declare what it checks._invalidate_stage: "extract", runner resets that stage to pending for the item. Dashboard shows items with active invalidations._discrepancies: [{field, extract_value, llm_value, severity}]. Stored in results table.success_rate_min: 0.50). Catches intermittent failures that never cluster enough to trip the consecutive CB. Config per-stage in YAML:content_ok=false and discrepancy rates over a sliding window (last N items). Different from exception-based circuit breaker.circuit_breaker.content_ok_min_rate: 0.80 pauses when <80% of items pass validation. Pause reason includes which upstream stage is likely broken.content_ok=false and discrepancy rates over a sliding window (last N items). Different from exception-based circuit breaker.circuit_breaker.content_ok_min_rate: 0.80 pauses when <80% of items pass validation. Pause reason includes which upstream stage is likely broken.auto_reprocess: true per stage.auto_reprocess: true per stage.--llm flag for open-ended "what can you find about X?" discovery, runs in parallel with Serper bulk--preset founder/exec/academic/contact controlling which enrichment sources and platforms to searchsite: searches already integrated.enrich.py (http_get, _RateLimiter, PROXY_SOURCES). Just need to swap the proxy zone or add BRIGHTDATA_RESIDENTIAL_PROXY env var.--llm flag for open-ended "what can you find about X?" discovery, runs in parallel with Serper bulk--preset founder/exec/academic/contact controlling which enrichment sources and platforms to searchsite: searches already integrated.enrich.py (http_get, _RateLimiter, PROXY_SOURCES). Just need to swap the proxy zone or add BRIGHTDATA_RESIDENTIAL_PROXY env var.--audio flag)
twilio:// source in audio_capture.py.twilio:// source in audio_capture.py.jobs/CLAUDE.md "Resource contention" section.needs_refetch=true in DB, reset fetch stage to pending so they get re-fetched with current credentials. Extract should distinguish "parser failed on good HTML" from "HTML is the paywall page."brain/reduce/ with image modality support, auto-refinement loop, and cost tracking. Design: docs/plans/2026-02-17-reduce-image-iteration-design.md. Key pieces:refine.py β Prompt expansion (vagueβconcrete), auto-refine (critiqueβimproved prompt), steer (user feedbackβrewrite)task.py β Add modality, prompt_history, costs, artifact_path, prompt_version, gen_cost fieldsgen.py β Image generation path via lib/llm/image_gen when modality: imageapp.py β Image gallery UI, cost display (per-candidate / session / daily), steer input, auto-loop controlsscore.py β Read images from artifact_path for scoring (base64 path already exists)record_application(). Two paths: (1) post-session analysis matches applied patterns to principles, (2) real-time detection during sessions when a principle-aligned action succeeds/fails. Without this, the effectiveness feedback loop never closes.title param adds value over structured prefix approach (A/B test with retrieval benchmark).docs/plans/2026-02-19-session-intelligence-server-design.md. Key pieces:/hook/prompt, /hist/{sid}, /state/{sid}, /health endpoints~/.claude/hooks/watch-api-hook.sh) β running alongside existing handlerwatch.api.localhostlearning/gyms/badge/ for tree + ablation testingpresent/ (papers, diagrams, demos, blog posts, tweets, slides). Port TBD, present.localhost.benchmarks/ (run configs, view results, compare models). Port TBD, bench.localhost.lib/llm) or litellm --proxy? Hot runner has subscription routing, caching, model aliases; litellm proxy gives OpenAI-compatible endpoint, load balancing, spend tracking. Decide and unify.lib/llm/TODO.md for implementation sketch.docs/plans/2026-02-19-session-intelligence-server-design.md.lib/private_data/claude.py β Consolidate existing JSONL readers into one module. Source: ~/.claude/projects//.jsonl + sessions-index.json + ~/.claude/history.jsonl. Already parsed by ops/sessions.py, supervisor/core/session.py, doctor/chronicle/native_reader.py β extract common reader, keep consumers as thin wrappers. ~2,700 sessions, 1.4GB. Format is well-documented (supervisor/docs/jsonl_format.md).lib/private_data/gemini.py β Read from ~/.gemini/journal/journal.db (SQLite) and ~/.gemini/tmp//chats/ . Already partially implemented in doctor/chronicle/native_reader.py (read_gemini_sessions()). Extract and consolidate.lib/private_data/chatgpt.py β Supervised import from OpenAI's official data export (Settings β Data Controls β Export). Downloads as ZIP containing conversations.json. Messages stored as a tree (mapping with parent/child UUIDs) β need to linearize by traversing from current_node backwards. macOS app data (~/Library/Application Support/com.openai.chat/conversations-v3-*/) is encrypted, not practical. No history API exists. Tools to evaluate: chatgpt-exporter, convoviz.lib/private_data/gemini_web.py β Supervised import from Google Takeout (select "Gemini" β "Conversations"). Exports as JSON per conversation with messages, timestamps, formatting preserved. No history retrieval API. AI Studio conversations save to Google Drive (separate path). Tools to evaluate: gemini-voyager, gemini-exporter.lib/private_data/browser_history.py β Read browsing history from local SQLite DBs. All major browsers lock the DB while running β must copy the file first (warn user or auto-copy to temp). Sources:~/Library/Application Support/Google/Chrome/Default/History β urls + visits tables, timestamps as microseconds since 1601-01-01~/Library/Safari/History.db β history_items + history_visits, timestamps as seconds since 2001-01-01~/Library/Application Support/Firefox/Profiles/*/places.sqlite β moz_places + moz_historyvisits, microseconds since epoch~/Library/Application Support/Arc/asyncio.gather across problems (with optional semaphore for rate limits)asyncio.gatherasyncio.gatherformat: math, \boxed{} extraction, LaTeX normalization. Config: benchmarks/configs/eval_math.yamlformat: code, subprocess sandbox execution. Config: benchmarks/configs/eval_humaneval.yamlbenchmarks/eval/, configs in benchmarks/configs/, invoke: inv bench.run -c eval_mathbenchmarks/terminal-bench/run.sh. Test with custom agent scaffoldingprinceton-nlp/SWE-bench_Verifiedlib/llm/EMBED_DESIGN.mdfast vs maxthink vs allthink, or custom strategy combos) and evaluate output quality. Sources: sample requests from real Claude Code sessions, ChatGPT/Gemini chat exports, brain app usage, and browser history (for URL-based queries). Pipeline: (1) curate a request corpus via lib/private_data/ importers (see main TODO.md β private data access tools), (2) run each request through N vario configs, (3) LLM-as-judge scores each output against a rubric (clarity, correctness, depth, actionability β rubric is per-domain, set up in advance), (4) aggregate scores per config, identify which configs win on which request types. Store results in experiments.db alongside regular runs. Goal: data-driven answer to "does vario complexity pay off?" and "which config for which task type?" Related: TODO.md:175 (brain benefit vs vanilla).mode=auto|js|unlocker and escalation evidence
learning/gyms/fetchability/docs/FETCHABILITY_MATRIX_SPEC.mdlearning/gyms/fetchability/tests/fixtures/fetchability_matrix.yamllearning/gyms/fetchability/tests/test_fetchability_matrix.pyBROWSER_TEST_SUBSTACK_PAID_URL, BROWSER_TEST_PATREON_PAID_URLBROWSER_TEST_SUBSTACK_PAID_AUTH=1, BROWSER_TEST_PATREON_PAID_AUTH=1lib/llm/tools.py
api.openalex.org/authors?search=X, 90M authors, free), Semantic Scholar API (api.semanticscholar.org, 220M papers, free), ORCID, arXiv, DBLP, PatentsView for inventor lookup. Key: h-index alone is misleading β compare to field median; citation velocity and co-author network quality matter more.googleapis.com/books/v1/volumes?q=inauthor:X, 40M books, free), Open Library API, Serper site: searches for WSJ/HBR/Forbes/etc. Key: distinguish publication tier β op-ed in WSJ vs self-published blog = very different signal.listennotes.com/api/v2/search?q=X&type=episode, 5M+ episodes, free tier 300/mo), Serper news, YouTube search via yt-dlp. Key: measures narrative control β does the person own their story (own blog/newsletter/podcast) or only appear when others write about them?CLASSIFY_MODEL variable). Returns Verdict with error_class, action, risk_tier, reason, pause_reason.stage_worker and simple_worker call classify_and_act(). Old consecutive_failures counter removed. Doctor call doesn't hold DB connection (uses its own short-lived conn for logging).PAUSE_THRESHOLD dict: systemic=3, code_bug=2, transient=10, temporal=1, item_specific=never. Below threshold β downgrade to fail_item. At threshold β pause. Counters reset on success.doctor_actions table in jobs.db: error_class, action_taken, risk_tier, handler_version, timestamps.lib.notify.push().elapsed_s).extract has a code bug), only that stage should stop. Items already past the broken stage (in score, check_enrich) should continue processing. Currently set_job_paused() is job-level β all stage workers stop.check_enrich finds upstream extract is broken): pause the whole job β the problem is upstream, not just this stage.role: validator in YAML should signal this behavior.stage_paused dict in runner memory (lost on restart, but restart re-evaluates anyway). Or job_stage_state table in DB for persistence.low β act immediately (done)medium β Pushover notify β wait 10 min β act (TODO: delay mechanism)high β Pushover notify β wait for business hours 9am-6pm PT (TODO: schedule check)handler_version alongside each error. Enables: "all failures from handler v.abc123 deployed 2h ago" β code bug vs "same version worked yesterday, errors at 3am" β external issue.doctor.diagnose_group(errors) β when multiple errors accumulate, look at them together: "5 errors: 4 are VIC cookie (same fingerprint), 1 timeout β root cause is auth."doctor.watch pipeline: fingerprinted, deduplicated, LLM-analyzed, with f to fix and s to silence.doctor.classify_error(). Regex stays as offline-only fallback.learn rename CLI command (update slug + file in DB, regenerate materialized .md)
gpt-5.3-codex-spark (real-time coding model, currently Pro-only)inv llm.shadow-report β HTML showing model comparison on real production data
/autonomous "small real task", inspect review.md, verify notifications fire (badge + TODO.md + macOS)AUTONOMOUS_PROTOCOL if tiers are miscalibratedtask & through the autonomous skill (currently only /autonomous is wired)ring-mqtt or Node-based wrappers).