170K Lines of Self-Improving AI Infrastructure — Built in 8 Weeks

A compound intelligence system that runs multi-model reasoning, learns from its own mistakes, and operates autonomously. Looking for a technical co-founder to scale it.

170K+

Lines of code

1,237

Commits

Reasoning strategies

664+

Sessions reviewed

What Exists

A full-stack AI reasoning system with three layers: ingestion, reasoning, and self-management. Not scaffolding — production infrastructure processing real data.

INGEST → Web scraping, transcription, document parsing, YouTube pipelines
REASON → Multi-provider LLM abstraction, 19 strategy engine, vector search (Qdrant), analytical lenses
OUTPUT → Dossiers, financial analysis, supply chain graphs, presentations
---
SELF-MANAGEMENT → Session review, principle extraction, sandbox evaluation, pipeline health
KNOWLEDGE → 25K+ learned instances, 40+ expert workflows, domain-specific patterns

Key technical decisions: Async Python throughout. Multi-provider LLM abstraction (Claude, GPT, Gemini, Grok — hot-swappable). SQLite + Qdrant for hybrid search. Gradio for rapid UIs. Playwright for autonomous web interaction. Redis for real-time data. Self-healing pipelines with LLM-assisted error triage.

The Interesting Problems

These are not wrapper-level engineering challenges. This is applied research running in production.

Multi-model consensus

How do you synthesize 4-8 model outputs into a single high-quality answer? When models disagree, which one is right? How do you detect when all models are confidently wrong?

Automated principle extraction

Session transcripts go in, behavioral principles come out. Each principle must be specific, testable, and actually improve future performance. 25K+ extracted so far.

Strategy selection

19 reasoning strategies built from 10 composable stages and 9 analytical lenses. The system must pick the right strategy for each problem type and adapt when initial choice underperforms.

Sandbox replay evaluation

How do you measure whether a learned principle actually helps? Replay past sessions with/without the principle. Quantify improvement. Kill principles that don't work.

Pipeline staleness detection

20+ autonomous pipelines running 24/7. Detect when outputs degrade, when upstream data changes, when models update. Version-aware freshness with automatic remediation.

Domain knowledge packaging

How do you take accumulated intelligence from one deployment and safely transfer it to benefit others in the same vertical — without leaking proprietary data?

What's Working

Learning Session → review → principles → better sessions. Measurable improvement. 664+ sessions reviewed, 25K+ learned instances in production.
Autonomous 20+ pipeline handlers. Self-healing with LLM error triage. Pipelines detect their own failures, diagnose root cause, and apply fixes without human intervention.
Reasoning 19 strategies across financial analysis, entity intelligence, and supply chain mapping. Multi-model consensus on real domain problems, not toy demos.
Autonomous 40+ encoded expert workflows. Browser automation, document generation, data pipeline management. Running against real data daily.
Learning Continuous self-monitoring: session analytics, shipping metrics, pipeline health, knowledge growth — all tracked and fed back into system behavior.

The Vision

Domain-specific reasoning as a product. The infrastructure for compound intelligence is built. It works in three verticals today. The next step: package it for customers. Every enterprise vertical — finance, legal, biotech, supply chain — gets a reasoning engine that accumulates domain knowledge and gets measurably better over time.

This is not a chatbot. This is not a prompt chain. This is infrastructure for AI systems that actually improve. The technical foundations exist. Now it needs to become a product.

The Role

Technical Co-founder

You would own a major axis — product, infrastructure, or go-to-market engineering — with full authority to shape the technical direction. This is a system with massive leverage: one engineer's work already compounded into 170K+ lines of working infrastructure. A second strong technical mind multiplies that further.

What you bring: Deep systems engineering experience. Comfort with ambiguity and research-grade problems. The ability to ship fast without cutting corners. An opinion about how AI reasoning systems should work — and the skill to build it.

What you get: A working system, not a slide deck. Real compound improvement, not hand-waving about "AI." A co-founder who builds — 1,237 commits in 8 weeks. And a problem space where the right technical decisions create lasting, defensible value.