Landscape survey — 50+ repos, 15+ papers, 20+ platforms · February 2026
The most active area. Multiple projects with 10K+ stars explore multi-agent architectures for financial analysis.
Bull vs Bear researcher debate with risk mgmt as separate consensus layer. Paper-backed (arXiv:2412.20138). Supports GPT-5.x, Gemini 3.x, Claude 4.x, Grok 4.x, Ollama. v0.2.0 released Feb 2026.
Investor-persona agents encoding real investment philosophies (Buffett, Munger, Burry, Cathie Wood). Well-coded but explicitly educational — not for real trading.
Cross-paradigm: LLM + RL + quant in one platform. Academic-grade, part of the AI4Finance ecosystem (FinGPT, FinRL). FinRobot Pro for professional equity research.
3-agent pipeline (Action → Validation → Answer). The self-validation loop — agent checks its own output for accuracy before final answer — is the key innovation.
Rivus comparison: rivus's finance/earnings/backtest/annotate.py already does two-stage LLM annotation (fast triage → smart re-score). The multi-agent debate pattern is the biggest gap — could improve
TFTFToo Fast To Follow — rivus's 6-dimension company scoring framework: Velocity, Compounding, Moat depth, Talent magnetism, Capital efficiency, Founder intensity.
scoring in intel/companies/.
Mature area with strong open-source tooling. The data access layer is largely solved; the analysis layer is where value remains.
| Project | Stars | What It Does | Quality | Rivus Relevance |
|---|---|---|---|---|
| edgartools | 1,700 | Python EDGAR access + XBRL parsing. Ships with MCP server for Claude. | Exceptional 3,393 commits, 1K+ tests, MIT |
High — already a rivus dep. MCP server unexploited |
| sec-insights (LlamaIndex) | 2,600 | Full-stack RAGRetrieval-Augmented Generation — LLM answers questions grounded in retrieved documents, reducing hallucination. for 10-K/10-Q Q&A with citation + PDF highlighting | Good Reference architecture |
Medium — citation pattern useful |
| edgar-crawler | 482 | Downloads filings → extracts sections → structured JSON. Peer-reviewed (WWW 2025) | Good | Medium — NLP dataset building |
| sec-parser | 273 | Parses EDGAR HTML into semantic tree structure | Good CI, type checking, docs |
Medium — RAG chunking |
| sec-edgar-mcp | 212 | MCPModel Context Protocol — Anthropic's standard for connecting AI assistants to external tools and data sources. server for direct AI assistant access to EDGAR | Strong 160 commits, AGPL-3.0 |
High — plug into Claude Code |
| Bellingcat EDGAR | 193 | CLI for EDGAR search with RSS monitoring for new filings | Clean | Medium — filing alerts |
Surprisingly thin landscape. Most repos use old-school NLP (VADER, Loughran-McDonaldA sentiment dictionary specifically designed for financial text, where words like "liability" have neutral rather than negative meaning. dictionaries) rather than LLMs.
| Project | Stars | Approach | Quality | Notes |
|---|---|---|---|---|
| defeatbeta-api | 476 | LLM function calling with JSON schema for structured extraction | Good | Most production-ready: schema-constrained extraction reduces hallucination. DuckDB + OpenAI. |
| Multimodal Earnings Platform | Low | Whisper + CV2 + BLIP-2 + Mistral + NER + NetworkX | Demo | Most ambitious: multimodal (audio+video+text) |
| FinRAGify | Low | RAG over 8 quarters, FAISS + CrossEncoder reranking | Interesting | Unique: cross-quarter commitment tracking — did mgmt deliver on promises? |
| Earnings Signal Extractor | Low | LLM sentiment + Q&A tone + QoQ trends | Demo | NVIDIA only. Segment-level sentiment (prepared vs Q&A) |
| Paper | Key Contribution |
|---|---|
| Agentic Topic Retrieval (Gupta 2025) | LLM agent discovers topics, builds evolving hierarchical ontology, tracks strategic priority shifts |
| Multi-Agent KPI Extraction (Choi 2025) | Extraction Agent + Text-to-SQL Agent: 95% accuracy on structuring filings, 91% on retrieval |
| MarketSenseAI 2.0 | Chain-of-Agents + HyDE RAG: 125.9% cumulative returns on S&P 100 vs 73.5% index (2 years) |
Prompt library: MLQ.ai 100+ Earnings Prompts — 14-category taxonomy. Best reference for multi-pass analysis chains.
| Model / Benchmark | Type | Status | Key Finding |
|---|---|---|---|
| FinGPT | Fine-tuned LLM | 18.5K ★ — Stale since Nov 2023 | Llama2-era LoRA. Sentiment F1 87.6%. AI4Finance pivoted to FinRobot |
| BloombergGPT | Proprietary 50B | Closed — never released | The 40-year data moat is the asset, not the architecture |
| PIXIU / FinBen | Benchmark | 834 ★ — Active | 42 datasets, 24 tasks, 8 categories. NeurIPS 2024. Powers Open FinLLM Leaderboard |
| FinRL | RL framework | 14K ★ — Active | NeurIPS 2020. Portfolio allocation, crypto, HFT. Research-grade |
| StockBench | Trading benchmark | Paper (2025) | Most LLM agents fail to beat buy-and-hold. Agents are biased bullish |
No open-source "velocity scoring" framework exists. The concept maps to commercial alternative data aggregation.
| Platform | Type | What It Tracks |
|---|---|---|
| Rivus TFTF | In-house | 6 dimensions: Velocity, Compounding, Moat depth, Talent magnetism, Capital efficiency, Founder intensity |
| Harmonic.ai | Commercial ($30M raised) | 35M+ companies, 195M+ profiles. Funding, hires, product launches |
| Specter | Commercial | Web visits, app downloads, headcount growth, investor interest |
| Crustdata | Commercial API | Headcount (historical), web traffic, hiring signals, tech stack. Flat-rate unlimited |
| AltIndex | SaaS | AI Score 0–100: Reddit, social buzz, job postings, app downloads |
| TheirStack | Commercial | 225K jobs/day from 104K sources. Tech adoption inferred from job postings |
Compose signals from APIs:
Mostly commercial, very little open-source. Rivus's intel/ pipeline is more comprehensive than any OSS alternative.
| Tool | Type | What It Does |
|---|---|---|
| VCII Founder Scorecard | Framework | Weighted quantitative + qualitative scoring, AI-enhanced |
| LEONVC | Commercial | Founder data + interviews + proprietary diagnostics. VC benchmarking |
| ReadyScore.ai | SaaS | AI startup investment-readiness scoring |
| Rivus intel/people/ | In-house | Face search, LinkedIn mapping, enrichment from 10+ APIs, web presence measurement |
Key stats: AI reduces due diligence time by ~60%. XGBoost screening outperformed median VC by 25%. AI adopters report 30–50% increase in high-quality deal flow.
| Paper | Year | Key Finding |
|---|---|---|
| TradingAgents | 2024 | Multi-agent debate improves Sharpe ratio and reduces drawdown |
| StockBench | 2025 | Most LLM agents fail to beat buy-and-hold. Agents biased bullish |
| AI-Trader | 2025 | General intelligence ≠ trading capability |
| Financial Statement Analysis | 2024 | GPT-4 outperforms professional financial analysts on earnings prediction |
| Multi-Agent KPI Extraction | 2025 | 95% accuracy structuring financial filings; 91% retrieval |
| MarketSenseAI 2.0 | 2025 | Chain-of-Agents: 125.9% returns vs 73.5% index (2yr) |
| FinMem | 2024 | Layered memory architecture mimicking human trader cognition (ICLR Workshop) |
| Priority | Action | Effort | Impact |
|---|---|---|---|
| P1 | Add Bull vs Bear debate pattern to TFTF scoring or thesis evaluation | M | High — proven to improve analysis quality |
| P1 | Enable edgartools MCP server for Claude sessions | S | High — zero-code SEC data access |
| P2 | Cross-quarter commitment tracking in earnings analysis (did mgmt deliver on promises?) | M | High — differentiating analysis |
| P2 | JSON schema function-calling for structured earnings extraction (defeatbeta pattern) | S | Medium — reduces hallucination |
| P2 | Continuous velocity signal tracking (hiring, product launches, GitHub) | L | High — closes point-in-time gap |
| P2 | Evaluate rivus earnings analysis against FinanceBench | M | Medium — calibrate quality |
| P3 | Evaluate against FinBen benchmark suite (42 datasets, 24 tasks) | M | Medium — systematic quality assessment |
lib/llm/debate.py) or a one-off in specific pipelines?Generated by rivus autonomous research · 5 parallel research agents, 50+ web searches · Feb 26, 2026