A system that ingests the world, reasons about it across models,
and learns from its own mistakes to get better over time.
Open questions
Where to point AI: The best use of rapidly improving AI capabilities is to apply them to two things:
Both uses create self-improving feedback loops. Better tools produce better work; better skills produce better tools. Businesses that focus AI here compound their advantage — each cycle makes the next one faster and more capable.
The main skill: The central capability we are building is how AI should autonomously work a project — driving tasks end-to-end while knowing when and how to resort to human assistance along the way.
This inverts the typical AI copilot model. Instead of a human driving with AI assistance, the AI drives the project — scoping work, executing plans, verifying results, filing follow-ups — and escalates to the human for judgment calls, design decisions, and quality gates. The human becomes the reviewer and strategist, not the typist. Getting this right is the meta-skill that makes all other capabilities compound.
Core thesis: By thinking 2–3× as much — generating alternatives, evaluating from more angles, building up background knowledge — you can almost always improve on a first-pass decision. The opportunity is to do more work systematically and then learn from comparing the richer result to what the single pass missed. Rivus makes that extra work automatic, not effortful.
Rivus is self-improving AI. Every session teaches the next; mistakes become principles; principles compound. More done, less human effort — and the gap only grows.
Rivus builds skilled domain experts. Like skillz: start by accumulating deep domain experience — from the web, from users, from subject-matter experts — especially around decision-making in a given domain. Then reason with that accumulated knowledge to sharpen the system’s judgment on each new decision. Delivers browsable portals, queryable MCP servers, change notifications, and bulk data export.
Rivus builds causal models. Predicts actions and choices from small-to-medium domain data (100–100K facts). Not just correlations — testable, explainable models of what drives decisions.
Why it produces high-quality output
Abundant input
Broad data collection across many sources and formats
Self-healing processing
Pipelines detect failures and retry with different strategies
Superior observability
Every tool call, error, and correction is visible and searchable
Self-tuning
Mistakes become principles; principles improve every future session
Data flows through three stages: collect, process, output. Linked items () open the live tool. Multi-model reasoning is pervasive — see How We Reason below.
What makes this different from vanilla Claude Code. Multi-model reasoning is pervasive — every stage uses it, not just one.
Concrete deliverables you can read, share, and act on.
Structured YAML profiles + prose dossiers with TFTF scoring, bull/bear investment memos, competitive landscape analysis. Cross-referenced with SEC filings and patents.
Earnings call × price alignment at ~250ms resolution. Backtests, screening, bottleneck analysis. Which claim caused which price move?
500+ semiconductor companies with supplier/customer/competitor edges. Wave-based discovery from anchor companies outward.
HTML reports, interactive portals, research writeups. Published to static content server for sharing.
25K+ instances distilled into actionable principles. Materialized to ~/.claude/principles/ — every future session inherits what was learned.
40+ encoded expert workflows. /commit, /debug, /present-project — invoke with a slash command, get a structured multi-step process.
How one developer’s hour becomes ten. The system multiplies human effort through three mechanisms.
Every session is reviewed. Mistakes become principles. Principles feed future sessions. The system gets measurably better over time.
664 sessions reviewed → 25K+ instances → principles materialized
The “+10 IQ points” engine. Instead of trusting a single generation, Vario does more work: generates alternatives broadly, evaluates from multiple angles, builds up background (rubrics, precedent, first principles) before committing to an answer. The gap between 1× and 3× effort is often worth closing.
19 strategies · 4–8 models · iterative convergence
Jobs run 24/7. Supervisor watches sessions. Doctor auto-fixes failures. Work happens while the developer sleeps.
17+ pipeline handlers · overnight TODOs · idle-aware scheduling
The compound effect: autonomous pipelines discover data around the clock. Multi-model reasoning produces higher quality analysis per prompt. Learning from mistakes means each session is faster and more accurate than the last. Skills encode expert workflows so common patterns take seconds instead of minutes.
Result: one developer managing work that would otherwise require a team — with quality that improves automatically.
Each module, what it does, and its key sub-components. Sorted by size.
Unified LLM workbench. Extracts content from URLs, runs parallel prompts across 4–8 models, evaluates with strategies and judges, and iteratively refines via generate-evaluate-iterate loops.
Shared library layer used by every module. Async LLM calls with model aliasing, image generation across 4 providers, vector search, semantic storage, billing monitoring, notifications, and proxy management.
Self-improvement system. Reviews coding sessions for patterns, extracts principles from experience, embeds knowledge into vector DB for fast retrieval. The system literally learns from its own work.
Self-healing pipeline engine with LLM error triage, semaphored concurrency, and version-aware staleness. 20+ autonomous pipelines across 5 domains:
Entity intelligence pipeline. Discovers companies via web search and SEC filings, fetches data at 3 cost tiers, enriches from free APIs (patents, GitHub, news), and synthesizes dossiers with LLM analysis.
Project health monitoring with auto-fix. Watches file changes, runs tests, tracks status. Chronicle sub-module analyzes coding sessions with D3 topic graphs and timeline visualizations.
Playwright-based browser automation. Headless browsing with proxy escalation (direct → stealth → Bright Data → full browser). Content ingestion for HTML, PDF, and YouTube transcripts.
Autonomous work orchestrator. Manages long-running operations, coordinates sidekick agents, runs periodic tasks. Bridges learning outputs into actionable knowledge for autonomous sessions.
Market analysis toolkit. Earnings call processing, backtesting framework, corporate ownership tracking, and bottleneck analysis. Integrates with Finnhub for real-time market data.
Specialized production utilities. Supply chain graph analysis (companies, relationships, bottlenecks), Japan market scrapers (EDINET filings, Kabutan stocks), and media processing.
Operations CLI and server management. Session management, iTerm2 control, resource monitoring, developer tools. Single point of control for all services via ops command.
Experiments and prototypes. LiteLLM testing, Grok search, problem-solving strategies, iTerm2 automation gym. Ideas that prove out graduate into full modules.
What each module actually does, what’s working, and what’s next.
Multi-source analysis: brain search "query" → fetch top 5 results in parallel → synthesize across sources. Auto-search fallback when input isn’t a URL.
Memory system (lib/memory): PostgreSQL + pgvector for self-organizing knowledge store with hybrid retrieval and applicability scoring.
~/.claude/principles/*.mdCLAUDE.md + ~/.claude/principles/Sidekick gym: test which interventions (auto-badge, convention warnings, boilerplate detection) are actually helpful vs noisy. Close the evaluation loop.
Cascade reprocessing: fix a parser → extract stage re-runs → downstream stages auto-mark stale. Validation circuit breaker when semantic failure rate exceeds threshold.
Consolidate with jobs-based company analysis (currently siloed — separate data dirs, separate prompts). Unified watchlist and shared prompt templates.
Connect error intelligence from jobs (currently separate) so operational failures inform project health. Cross-project resource coordination.
--escalateCookie-based authentication: extract from real Chrome profile for sites requiring login (Google, Gmail, Gemini consumer).
Principle violation detection: LLM-powered checks against learned principles during active sessions. Shadow worker phase for safe verification in separate worktrees.
Bottleneck analysis framework: map supply chain constraints (electricity, transformers, water, labor, permits) → identify winners/losers upstream and downstream.
Consolidate supply chain data from jobs pipeline into unified graph. Expand beyond semiconductors to other industries.
Visual representation of relative module sizes. Area proportional to lines of code.
From first commit to today. Two months, two phases of rapid growth.
Foundation — Core systems established
Expansion — 16 new directories, system deepens
The developer becomes a simultaneous chess player.
A chess grandmaster in a simul walks from board to board — each position is different, each opponent plays their own game, but the grandmaster sees patterns across all of them and makes strong moves in seconds. The boards don’t wait for each other. The grandmaster’s strength isn’t just depth on one board — it’s breadth across many, with enough depth on each to win.
That’s what this system is for. A single developer managing parallel research pipelines, autonomous data jobs, live monitoring, self-improving code agents, and investment analysis — all at once. Each “board” runs on its own, escalates when stuck, and learns from its mistakes. The developer walks the room, makes the calls that matter, and moves on.
The bottleneck shifts from doing the work to directing the work. Rivus is the room full of boards.
LLM providers, external services, infrastructure, and storage powering the system.
Every module placed in its pipeline stage, with LOC counts and data flow arrows. Cross-cutting infrastructure at the bottom.