Rivus

Key Notes & Questions

Vision: We improve LLM performance in domains, not just on single queries. The unit of value is a vertical, not a prompt.
Shadow mode: Deploy alongside current workflows to observe — can we do better, or offer automation where there is none?
Franchise model: Can we package this for partners addressing specific verticals or geographies?
Demo idea: Show with vs. without the knowledge base — populate the KB, then solve a problem, demonstrating the compounding benefit of learned principles on process quality.
Capability example: Find VC portfolio companies that may want to use this — the system can research, filter, and score prospects autonomously.

Open questions

Which business model / distribution avenue? Pick a north star.
Talk to a product marketing manager.

Vision

Where to point AI: The best use of rapidly improving AI capabilities is to apply them to two things:

The codebase itself — letting AI improve the tools, pipelines, and infrastructure it runs on.
Acquiring and refining skills — the rigorous, measurable capabilities (not just prompts) involved in building products: research, evaluation, synthesis, domain reasoning.

Both uses create self-improving feedback loops. Better tools produce better work; better skills produce better tools. Businesses that focus AI here compound their advantage — each cycle makes the next one faster and more capable.

The main skill: The central capability we are building is how AI should autonomously work a project — driving tasks end-to-end while knowing when and how to resort to human assistance along the way.

This inverts the typical AI copilot model. Instead of a human driving with AI assistance, the AI drives the project — scoping work, executing plans, verifying results, filing follow-ups — and escalates to the human for judgment calls, design decisions, and quality gates. The human becomes the reviewer and strategist, not the typist. Getting this right is the meta-skill that makes all other capabilities compound.

Core thesis: By thinking 2–3× as much — generating alternatives, evaluating from more angles, building up background knowledge — you can almost always improve on a first-pass decision. The opportunity is to do more work systematically and then learn from comparing the richer result to what the single pass missed. Rivus makes that extra work automatic, not effortful.

Rivus is self-improving AI. Every session teaches the next; mistakes become principles; principles compound. More done, less human effort — and the gap only grows.

Rivus builds skilled domain experts. Like skillz: start by accumulating deep domain experience — from the web, from users, from subject-matter experts — especially around decision-making in a given domain. Then reason with that accumulated knowledge to sharpen the system’s judgment on each new decision. Delivers browsable portals, queryable MCP servers, change notifications, and bulk data export.

Rivus builds causal models. Predicts actions and choices from small-to-medium domain data (100–100K facts). Not just correlations — testable, explainable models of what drives decisions.

Why it produces high-quality output

Abundant input

Broad data collection across many sources and formats

Self-healing processing

Pipelines detect failures and retry with different strategies

Superior observability

Every tool call, error, and correction is visible and searchable

Self-tuning

Mistakes become principles; principles improve every future session

How We Reason

What makes this different from vanilla Claude Code. Multi-model reasoning is pervasive — every stage uses it, not just one.

Example Outputs

Concrete deliverables you can read, share, and act on.

🔍

Company & People Dossiers

Structured YAML profiles + prose dossiers with TFTF scoring, bull/bear investment memos, competitive landscape analysis. Cross-referenced with SEC filings and patents.

kb.localhost/intel/people

💰

Financial Analysis

Earnings call × price alignment at ~250ms resolution. Backtests, screening, bottleneck analysis. Which claim caused which price move?

earnings.localhost

🏭

Supply Chain Graphs

500+ semiconductor companies with supplier/customer/competitor edges. Wave-based discovery from anchor companies outward.

📊

Reports & Portals

HTML reports, interactive portals, research writeups. Published to static content server for sharing.

static.localhost hub.localhost

🧠

Learned Principles

25K+ instances distilled into actionable principles. Materialized to ~/.claude/principles/ — every future session inherits what was learned.

kb.localhost/learning

⚡

Skills & Workflows

40+ encoded expert workflows. /commit, /debug, /present-project — invoke with a slash command, get a structured multi-step process.

Amplification

How one developer’s hour becomes ten. The system multiplies human effort through three mechanisms.

🎓

Learning Loop

Every session is reviewed. Mistakes become principles. Principles feed future sessions. The system gets measurably better over time.

664 sessions reviewed → 25K+ instances → principles materialized

Learning deep dive →

🔱

Multi-Model Reasoning

The “+10 IQ points” engine. Instead of trusting a single generation, Vario does more work: generates alternatives broadly, evaluates from multiple angles, builds up background (rubrics, precedent, first principles) before committing to an answer. The gap between 1× and 3× effort is often worth closing.

19 strategies · 4–8 models · iterative convergence

Vario deep dive →

🤖

Autonomous Operation

Jobs run 24/7. Supervisor watches sessions. Doctor auto-fixes failures. Work happens while the developer sleeps.

17+ pipeline handlers · overnight TODOs · idle-aware scheduling

The compound effect: autonomous pipelines discover data around the clock. Multi-model reasoning produces higher quality analysis per prompt. Learning from mistakes means each session is faster and more accurate than the last. Skills encode expert workflows so common patterns take seconds instead of minutes.

Result: one developer managing work that would otherwise require a team — with quality that improves automatically.

Components

Each module, what it does, and its key sub-components. Sorted by size.

🔱

vario

24,131 LOC · 101 files

Unified LLM workbench. Extracts content from URLs, runs parallel prompts across 4–8 models, evaluates with strategies and judges, and iteratively refines via generate-evaluate-iterate loops.

Extract Studio Engine Reasoning Strategies Gradio UI

📚

lib

20,633 LOC · 116 files

Shared library layer used by every module. Async LLM calls with model aliasing, image generation across 4 providers, vector search, semantic storage, billing monitoring, notifications, and proxy management.

lib/llm lib/vectors lib/semnet lib/billing lib/notify lib/brightdata lib/ytdl

🎓

learning

18,274 LOC · 39 files

Self-improvement system. Reviews coding sessions for patterns, extracts principles from experience, embeds knowledge into vector DB for fast retrieval. The system literally learns from its own work.

Session Review Pattern Discovery Principles Embeddings Pond

👷

jobs

16,668 LOC · 52 files

Self-healing pipeline engine with LLM error triage, semaphored concurrency, and version-aware staleness. 20+ autonomous pipelines across 5 domains:

YouTube — 6 channels (a16z, DML, HG, Dwarkesh, Lex, PLTR) Earnings — large-cap backfill, transcripts, IR Company Research — VIC ideas, enrichment, scoring Supply Chain — 500+ semis, anchor → expand graph Newsflow — live monitoring, curated URLs

🔍

intel

13,544 LOC · 36 files

Entity intelligence pipeline. Discovers companies via web search and SEC filings, fetches data at 3 cost tiers, enriches from free APIs (patents, GitHub, news), and synthesizes dossiers with LLM analysis.

Companies People TFTF Framework Discover Fetch Analyze

🏥

doctor

11,961 LOC · 36 files

Project health monitoring with auto-fix. Watches file changes, runs tests, tracks status. Chronicle sub-module analyzes coding sessions with D3 topic graphs and timeline visualizations.

Watch Auto-Fix Chronicle Collaboration Topic Graph

🌐

browser

10,777 LOC · 45 files

Playwright-based browser automation. Headless browsing with proxy escalation (direct → stealth → Bright Data → full browser). Content ingestion for HTML, PDF, and YouTube transcripts.

Agent Server Ingest Proxy Escalation Cache

🤖

supervisor

10,318 LOC · 51 files

Autonomous work orchestrator. Manages long-running operations, coordinates sidekick agents, runs periodic tasks. Bridges learning outputs into actionable knowledge for autonomous sessions.

Autonomous Sidekick Event Loop Periodic Benchmarks

💰

finance

9,867 LOC · 45 files

Market analysis toolkit. Earnings call processing, backtesting framework, corporate ownership tracking, and bottleneck analysis. Integrates with Finnhub for real-time market data.

Earnings Backtest Ownership Bottleneck Analysis

🏭

tools

7,638 LOC · 31 files

Specialized production utilities. Supply chain graph analysis (companies, relationships, bottlenecks), Japan market scrapers (EDINET filings, Kabutan stocks), and media processing.

Supply Chain EDINET Kabutan Media

⚙️

ops

4,713 LOC · 16 files

Operations CLI and server management. Session management, iTerm2 control, resource monitoring, developer tools. Single point of control for all services via ops command.

CLI Watch Resmon Devtools

🧪

explorations

4,466 LOC · 24 files

Experiments and prototypes. LiteLLM testing, Grok search, problem-solving strategies, iTerm2 automation gym. Ideas that prove out graduate into full modules.

LiteLLM CLI Grok Search Problem Solving Gym

Component Deep Dives

What each module actually does, what’s working, and what’s next.

🔱

vario

19 strategies from 10 composable stages, 9 analytical lenses — automated problem-solving at scale

24,131 LOC · 101 files

Extract Studio Engine Reasoning Strategies Gradio UI

What works now

Studio — run same prompt across 4–8 models in parallel, generate → evaluate → iterate until quality converges
Reasoning Strategies — 19 encoded strategies (chain-of-thought, self-critique, ensemble, lens variations) with SQL-backed move tracking
Extract — fetch any URL, parse HTML/PDF/YouTube, extract structured facts

How it fits together

Extract pulls in content → Studio generates across models → evaluates and iterates to convergence
Reasoning Strategies benchmark: compare which approach works best for which problem type
Unified Gradio UI at vario.localhost with Extract, Studio, and Prompts tabs

Coming next

Multi-source analysis: brain search "query" → fetch top 5 results in parallel → synthesize across sources. Auto-search fallback when input isn’t a URL.

📚

lib

Shared foundation that every module imports — eliminates duplication across the system

20,633 LOC · 116 files

lib/llm lib/vectors lib/semnet lib/billing lib/notify lib/brightdata lib/ytdl

What works now

lib/llm — async calls to 6+ providers with model aliasing, streaming, pricing, web search
lib/vectors — Qdrant local vector search for semantic retrieval (learnings, sessions)
lib/semnet — 3-level semantic storage (doc summaries, chunks, claims) with SQLite + Qdrant, domain adapters
lib/billing — real-time API cost monitoring with Gradio dashboard
lib/notify — tiered Pushover + local notifications (info/warning/critical)
lib/coord — multi-session conflict avoidance (file claims, activity tracking)

Also includes

lib/brightdata — proxy zones, YouTube datasets, Web Unlocker
lib/ytdl — yt-dlp wrappers with auth and proxy management
lib/discovery_ops — shared discovery pipeline infra (cache, search, BD client) reused by 5+ projects
lib/config_validation — self-documenting YAML errors + LLM-powered “freehand config”

Coming next

Memory system (lib/memory): PostgreSQL + pgvector for self-organizing knowledge store with hybrid retrieval and applicability scoring.

🎓

learning

The system literally learns from its own mistakes — session review → principles → sandbox testing

18,274 LOC · 39 files

Session Review Pattern Discovery Principles Embeddings Pond

What works now

Session review — parse Claude/Gemini transcripts, extract error→repair pairs (664+ analyzed)
Principles DB — 25K+ instances linked to principles, auto-classified by LLM, materialized to ~/.claude/principles/*.md
Failure mining — multi-model judges (Gemini, Grok, Claude) score which tool call fixed each error, majority vote consensus
Sandbox eval — Docker-based replay: run Claude against specific commits, measure wall-clock time, tool calls, result quality

Gyms (self-improvement)

Badge gym — test prompt variants by replaying real sessions, score quality, pick best
Fetchability gym — probe URLs with httpx/proxy/unlocker in parallel, build site × method matrix
Principles flow back into sessions via CLAUDE.md + ~/.claude/principles/

Coming next

Sidekick gym: test which interventions (auto-badge, convention warnings, boilerplate detection) are actually helpful vs noisy. Close the evaluation loop.

👷

jobs

20+ self-healing pipelines with LLM error triage, semaphored concurrency, and version-aware staleness

16,668 LOC · 52 files

YouTube (6 channels) Earnings Research Company Analysis Supply Chain Newsflow Dashboard Diagnostics

What works now

Stage-aware tracking — items flow through fetch → extract → score independently, with per-stage timing and concurrency limits
Error intelligence — every exception auto-classified by LLM as transient/item-specific/systemic/code-bug, drives retry/skip/pause
Version-aware staleness — code changes → hash changes → items marked stale → one-click reprocess
Gradio dashboard at jobs.localhost with live stats, error drill-down, job control

Pipeline examples

Supply chain — 2 jobs build a graph of 500+ semiconductor companies with supplier/customer/competitor edges
VIC research — idea discovery → content processing → enrichment → scoring
Multi-job workflows — one job’s output feeds the next via tracker_query discovery

Coming next

Cascade reprocessing: fix a parser → extract stage re-runs → downstream stages auto-mark stale. Validation circuit breaker when semantic failure rate exceeds threshold.

🔍

intel

From a company name to a full investment dossier with TFTF scoring — automated end to end

13,544 LOC · 36 files

Companies People TFTF Framework Discover Fetch Analyze

What works now

Companies pipeline — Serper search → SEC EDGAR → Bright Data scraping (3 cost tiers) → free API enrichment (patents, GitHub, news) → LLM synthesis
People pipeline — discover via search → fetch profiles → enrich from SEC forms → cluster & analyze for VC theses
TFTF framework — Technology, Financials, Team, Fit scoring with bull/bear investment memos

Outputs

Structured YAML profiles + prose dossiers (Markdown)
Competitive landscape analysis
Cross-referenced with SEC filings and patent data

Coming next

Consolidate with jobs-based company analysis (currently siloed — separate data dirs, separate prompts). Unified watchlist and shared prompt templates.

🏥

doctor

Watches every file change, runs tests, auto-fixes failures using Claude Code — while avoiding conflicts with human sessions

11,961 LOC · 36 files

Watch Auto-Fix Chronicle Collaboration Topic Graph

What works now

Watch + auto-fix — FSEvents file watcher, auto-runs tests, spawns Claude to fix failures
Chronicle — analyzes coding sessions with D3 topic graphs, timeline, accomplishment extraction
Collaboration — publishes status to shared YAML so user sessions know doctor is active

How it works

Doctor claims files before editing via lib/coord — no conflicts with user sessions
Session intelligence API (port 8130) powers /hist, /jump, badges

Coming next

Connect error intelligence from jobs (currently separate) so operational failures inform project health. Cross-project resource coordination.

🌐

browser

5-level escalation ladder: free → stealth → proxy → unlocker → full Playwright — pay only when needed

10,777 LOC · 45 files

Agent Server Ingest Proxy Escalation Cache

What works now

Direct + stealth — free, ~200-500ms, handles most sites
Bright Data proxy — residential IPs, ~500-800ms, bypasses geo-blocks
Web Unlocker + Playwright — for JS-heavy sites, CAPTCHA, login walls
Refusal detection — paywalls, CAPTCHAs, login walls identified automatically
HTTP server — single Playwright instance on :8100, control via curl

Content handling

HTML, PDF, YouTube transcripts all supported
10-min TTL cache with fetch_mode metadata
Auto-escalation on failure via --escalate

Coming next

Cookie-based authentication: extract from real Chrome profile for sites requiring login (Google, Gmail, Gemini consumer).

🤖

supervisor

Watches every Claude session in real time — auto-badges, event tracking, live session grid

10,318 LOC · 51 files

Autonomous Sidekick Event Loop Periodic Benchmarks

What works now

Sidekick hooks — SessionStart/PostToolUse events → auto-badge generation, event recording, resource tracking
Watch UI — live session grid with activity, timeline, topic graph tabs
Passive error observation — tails session JSONL files for errors → classifies → writes doctor-compatible logs
Idle detection — atomic timestamp tracking across all sessions for autonomous work scheduling

Session intelligence

Watch API (port 8130) serves /hist, /jump, /recap, badge data
Vector search across all session transcripts

Coming next

Principle violation detection: LLM-powered checks against learned principles during active sessions. Shadow worker phase for safe verification in separate worktrees.

💰

finance

Tick-level price alignment with earnings transcripts — which claim caused which price move?

9,867 LOC · 45 files

Earnings Backtest Ownership Bottleneck Analysis

What works now

Earnings backtest — NBBO + trades + transcript aligned at ~250ms resolution
Finnhub screening — 1-min candles for big-move discovery and calendar events
IB integration — Stockloader service with persistent TWS/Gateway connection, 60-req/10min pacing
VIC returns calculator — cross-listing DB + symbol resolution + price library

Data sources

Interactive Brokers (tick data)
Finnhub (fundamentals, candles, filings)
SEC EDGAR (ownership, filings)
Redis time series for real-time prices

Coming next

Bottleneck analysis framework: map supply chain constraints (electricity, transformers, water, labor, permits) → identify winners/losers upstream and downstream.

🛠️

tools

Graduated prototypes: supply chain graph of 500+ semiconductor companies with relationship edges

7,638 LOC · 31 files

Supply Chain EDINET Kabutan Media

What works now

Supply chain graph — SQLite with supplier/customer/competitor edges, wave-based discovery (anchors → expand frontier)
Entity resolution — ticker → Finnhub → GLEIF → PermID matching
EDINET — Japanese financial filings scraper
Kabutan — Japan stock data collection

Origin

Started in explorations/, graduated to production when proven
Supply chain data also fed by jobs pipeline (2 dedicated handlers)

Coming next

Consolidate supply chain data from jobs pipeline into unified graph. Expand beyond semiconductors to other industries.

Codebase Size Map

Visual representation of relative module sizes. Area proportional to lines of code.

🔱 vario
24K LOC

📚 lib
21K LOC

🎓 learning
18K LOC

👷 jobs
17K LOC

🔍 intel
14K LOC

🏥 doctor
12K LOC

🌐 browser
11K

🤖 supervisor
10K LOC

💰 finance
10K LOC

🏭 tools
8K LOC

⚙️ ops 4K

🧪 explorations 4K

📊 present 3K

🎯 projects 2K

Evolution Timeline

From first commit to today. Two months, two phases of rapid growth.

January 2026

Foundation — Core systems established

542

Commits

~25

Dirs Created

vario Unified LLM workbench: Extract, Studio, Reasoning Strategies
browser Playwright automation with proxy escalation
lib/llm Async LLM framework with model aliasing
learning Session review and pattern discovery
explorations Experiments: LiteLLM, Grok search, iTerm2 gym
benchmarks BrowseComp, HLE, terminal benchmarks
tools Supply chain analysis, EDINET, Kabutan scrapers
infra Caddy reverse proxy, Cloudflare tunnel, launchd

February 2026

Expansion — 16 new directories, system deepens

695

Commits

+16

New Dirs

intel Company & people dossier pipelines with TFTF scoring
jobs 17-handler pipeline orchestration with Gradio dashboard
doctor Chronicle session analysis, D3 topic graphs, auto-fix
supervisor Autonomous orchestrator, sidekick agents, event loop
finance Earnings analysis, backtesting, Finnhub integration
ops Unified CLI for servers, sessions, iTerm2 control
lib/vectors Qdrant local vector search for semantic retrieval
lib/notify Unified Pushover + local notification system
projects Long-running goals: VC intel, skill acquisition
present AI ops pitch, search quality reports

Where Software Is Going

The developer becomes a simultaneous chess player.

A chess grandmaster in a simul walks from board to board — each position is different, each opponent plays their own game, but the grandmaster sees patterns across all of them and makes strong moves in seconds. The boards don’t wait for each other. The grandmaster’s strength isn’t just depth on one board — it’s breadth across many, with enough depth on each to win.

That’s what this system is for. A single developer managing parallel research pipelines, autonomous data jobs, live monitoring, self-improving code agents, and investment analysis — all at once. Each “board” runs on its own, escalates when stuck, and learns from its mistakes. The developer walks the room, makes the calls that matter, and moves on.

The bottleneck shifts from doing the work to directing the work. Rivus is the room full of boards.

Technology Stack

LLM providers, external services, infrastructure, and storage powering the system.

LLM Providers

Anthropic (Claude)
OpenAI (GPT, Embeddings)
Google (Gemini, Imagen)
xAI (Grok)
Groq (fast inference)
MiniMax

External APIs

Bright Data (proxies)
Serper (search)
Finnhub (markets)
SEC EDGAR (filings)
PatentsView
Pushover (notifications)

Infrastructure

Caddy (reverse proxy)
Cloudflare Tunnel
Cloudflare Pages
launchd (8 services)
tmux + iTerm2
Playwright

Storage

SQLite (multiple DBs)
Qdrant (vector search)
Redis (time series)
Parquet (datasets)

UI & Visualization

Gradio 6 (apps)
D3.js (graphs)
SVG (diagrams)
HTML reports

Python Stack

asyncio / httpx
Click (CLIs)
Pandas (data)
Invoke (tasks)
Loguru (logging)
Pydantic (models)

Appendix: Module Map

Every module placed in its pipeline stage, with LOC counts and data flow arrows. Cross-cutting infrastructure at the bottom.

Vision

How It Works

How We Reason

Example Outputs

Company & People Dossiers

Financial Analysis

Supply Chain Graphs

Reports & Portals

Learned Principles

Skills & Workflows

Amplification

Learning Loop

Multi-Model Reasoning

Autonomous Operation

Components

vario

lib

learning

jobs

intel

doctor

browser

supervisor

finance

tools

ops

explorations

Component Deep Dives

vario

What works now

How it fits together

Coming next

lib

What works now

Also includes

Coming next

learning

What works now

Gyms (self-improvement)

Coming next

jobs

What works now

Pipeline examples

Coming next

intel

What works now

Outputs

Coming next

doctor

What works now

How it works

Coming next

browser

What works now

Content handling

Coming next

supervisor

What works now

Session intelligence

Coming next

finance

What works now

Data sources

Coming next

tools

What works now

Origin

Coming next

Codebase Size Map

Evolution Timeline

January 2026

February 2026

Where Software Is Going

Technology Stack

LLM Providers

External APIs

Infrastructure

Storage

UI & Visualization