AI Stock Analysis Tools & Approaches

Landscape survey — 50+ repos, 15+ papers, 20+ platforms · February 2026

Key Takeaways

1. Multi-Agent Stock Analysis Frameworks

The most active area. Multiple projects with 10K+ stars explore multi-agent architectures for financial analysis.

TradingAgents

30.8K ★ · High quality · LangGraph

Bull vs Bear researcher debate with risk mgmt as separate consensus layer. Paper-backed (arXiv:2412.20138). Supports GPT-5.x, Gemini 3.x, Claude 4.x, Grok 4.x, Ollama. v0.2.0 released Feb 2026.

AI Hedge Fund

45.9K ★ · Good · 18 agents

Investor-persona agents encoding real investment philosophies (Buffett, Munger, Burry, Cathie Wood). Well-coded but explicitly educational — not for real trading.

FinRobot

6.3K ★ · Good · AI4Finance

Cross-paradigm: LLM + RL + quant in one platform. Academic-grade, part of the AI4Finance ecosystem (FinGPT, FinRL). FinRobot Pro for professional equity research.

Dexter

Interesting · ~200 lines TS

3-agent pipeline (Action → Validation → Answer). The self-validation loop — agent checks its own output for accuracy before final answer — is the key innovation.

Recurring Architectural Patterns

  1. Debate/adversarial — Bull vs Bear agents arguing before a decision maker
  2. Specialist pipeline — Fundamental → Technical → Sentiment → Risk → Portfolio
  3. Self-reflection — Agent reviews its own output before committing
  4. Multi-source fusion — SEC filings + news + social + price data via specialist agents
  5. Risk as a gate — Risk management agent with veto power

Rivus comparison: rivus's finance/earnings/backtest/annotate.py already does two-stage LLM annotation (fast triage → smart re-score). The multi-agent debate pattern is the biggest gap — could improve TFTFToo Fast To Follow — rivus's 6-dimension company scoring framework: Velocity, Compounding, Moat depth, Talent magnetism, Capital efficiency, Founder intensity. scoring in intel/companies/.

2. SEC Filings + LLM Extraction

Mature area with strong open-source tooling. The data access layer is largely solved; the analysis layer is where value remains.

ProjectStarsWhat It DoesQualityRivus Relevance
edgartools 1,700 Python EDGAR access + XBRL parsing. Ships with MCP server for Claude. Exceptional
3,393 commits, 1K+ tests, MIT
High — already a rivus dep. MCP server unexploited
sec-insights (LlamaIndex) 2,600 Full-stack RAGRetrieval-Augmented Generation — LLM answers questions grounded in retrieved documents, reducing hallucination. for 10-K/10-Q Q&A with citation + PDF highlighting Good
Reference architecture
Medium — citation pattern useful
edgar-crawler 482 Downloads filings → extracts sections → structured JSON. Peer-reviewed (WWW 2025) Good Medium — NLP dataset building
sec-parser 273 Parses EDGAR HTML into semantic tree structure Good
CI, type checking, docs
Medium — RAG chunking
sec-edgar-mcp 212 MCPModel Context Protocol — Anthropic's standard for connecting AI assistants to external tools and data sources. server for direct AI assistant access to EDGAR Strong
160 commits, AGPL-3.0
High — plug into Claude Code
Bellingcat EDGAR 193 CLI for EDGAR search with RSS monitoring for new filings Clean Medium — filing alerts
Benchmark: FinanceBench — 10,231 QA pairs, 40 companies. GPT-4 Turbo with retrieval: 19% correct. Best multi-agent system (Pathway LiveAI): 56%. SEC filing QA is a hard problem.

3. Earnings Call Analysis with LLMs

Surprisingly thin landscape. Most repos use old-school NLP (VADER, Loughran-McDonaldA sentiment dictionary specifically designed for financial text, where words like "liability" have neutral rather than negative meaning. dictionaries) rather than LLMs.

ProjectStarsApproachQualityNotes
defeatbeta-api 476 LLM function calling with JSON schema for structured extraction Good Most production-ready: schema-constrained extraction reduces hallucination. DuckDB + OpenAI.
Multimodal Earnings Platform Low Whisper + CV2 + BLIP-2 + Mistral + NER + NetworkX Demo Most ambitious: multimodal (audio+video+text)
FinRAGify Low RAG over 8 quarters, FAISS + CrossEncoder reranking Interesting Unique: cross-quarter commitment tracking — did mgmt deliver on promises?
Earnings Signal Extractor Low LLM sentiment + Q&A tone + QoQ trends Demo NVIDIA only. Segment-level sentiment (prepared vs Q&A)
Standout Academic Papers on Earnings Calls
PaperKey Contribution
Agentic Topic Retrieval (Gupta 2025) LLM agent discovers topics, builds evolving hierarchical ontology, tracks strategic priority shifts
Multi-Agent KPI Extraction (Choi 2025) Extraction Agent + Text-to-SQL Agent: 95% accuracy on structuring filings, 91% on retrieval
MarketSenseAI 2.0 Chain-of-Agents + HyDE RAG: 125.9% cumulative returns on S&P 100 vs 73.5% index (2 years)

Prompt library: MLQ.ai 100+ Earnings Prompts — 14-category taxonomy. Best reference for multi-pass analysis chains.

Key gap: No production-quality open-source tool does what rivus does — real-time transcript-to-price alignment with two-stage LLM annotation at utterance-level granularity with IB tick-level (~250ms) data. The closest commercial equivalent is AlphaSense ($10K–50K/yr).

4. Financial LLMs & Benchmarks

Model / BenchmarkTypeStatusKey Finding
FinGPT Fine-tuned LLM 18.5K ★ — Stale since Nov 2023 Llama2-era LoRA. Sentiment F1 87.6%. AI4Finance pivoted to FinRobot
BloombergGPT Proprietary 50B Closed — never released The 40-year data moat is the asset, not the architecture
PIXIU / FinBen Benchmark 834 ★ — Active 42 datasets, 24 tasks, 8 categories. NeurIPS 2024. Powers Open FinLLM Leaderboard
FinRL RL framework 14K ★ — Active NeurIPS 2020. Portfolio allocation, crypto, HFT. Research-grade
StockBench Trading benchmark Paper (2025) Most LLM agents fail to beat buy-and-hold. Agents are biased bullish
Honest assessment: Nobody deploys FinGPT or FinMA in production. Frontier models (GPT-4o, Claude, Gemini, DeepSeek-R1) with financial prompting beat all dedicated financial fine-tunes on most benchmarks. The lasting contributions are the benchmarks (FinBen, FinanceBench), the fine-tuning datasets, and FinRL.

5. Velocity Scoring & Company Momentum

No open-source "velocity scoring" framework exists. The concept maps to commercial alternative data aggregation.

PlatformTypeWhat It Tracks
Rivus TFTF In-house 6 dimensions: Velocity, Compounding, Moat depth, Talent magnetism, Capital efficiency, Founder intensity
Harmonic.ai Commercial ($30M raised) 35M+ companies, 195M+ profiles. Funding, hires, product launches
Specter Commercial Web visits, app downloads, headcount growth, investor interest
Crustdata Commercial API Headcount (historical), web traffic, hiring signals, tech stack. Flat-rate unlimited
AltIndex SaaS AI Score 0–100: Reddit, social buzz, job postings, app downloads
TheirStack Commercial 225K jobs/day from 104K sources. Tech adoption inferred from job postings

Build-Your-Own Velocity Score

Compose signals from APIs:

6. Startup / Founder Evaluation

Mostly commercial, very little open-source. Rivus's intel/ pipeline is more comprehensive than any OSS alternative.

ToolTypeWhat It Does
VCII Founder ScorecardFrameworkWeighted quantitative + qualitative scoring, AI-enhanced
LEONVCCommercialFounder data + interviews + proprietary diagnostics. VC benchmarking
ReadyScore.aiSaaSAI startup investment-readiness scoring
Rivus intel/people/In-houseFace search, LinkedIn mapping, enrichment from 10+ APIs, web presence measurement

Key stats: AI reduces due diligence time by ~60%. XGBoost screening outperformed median VC by 25%. AI adopters report 30–50% increase in high-quality deal flow.

7. Key Research Papers (2024–2026)

PaperYearKey Finding
TradingAgents2024Multi-agent debate improves Sharpe ratio and reduces drawdown
StockBench2025Most LLM agents fail to beat buy-and-hold. Agents biased bullish
AI-Trader2025General intelligence ≠ trading capability
Financial Statement Analysis2024GPT-4 outperforms professional financial analysts on earnings prediction
Multi-Agent KPI Extraction202595% accuracy structuring financial filings; 91% retrieval
MarketSenseAI 2.02025Chain-of-Agents: 125.9% returns vs 73.5% index (2yr)
FinMem2024Layered memory architecture mimicking human trader cognition (ICLR Workshop)
Curated Meta-Resources

8. Rivus Positioning

Ahead

  • Earnings call analysis — transcript-to-price at utterance granularity, two-stage LLM annotation
  • Founder/CEO evaluation — face search, LinkedIn mapping, 10+ API enrichment
  • VIC thesis backtesting — 25K ideas, 92% symbol hit rate, 7 horizons

At Par

  • Company TFTF scoring — 6-dimension framework, web-search grounded
  • SEC filing access — edgartools dependency in intel/ pipeline

Could Adopt

  • Multi-agent debate — Bull vs Bear pattern from TradingAgents
  • SEC filing MCP server — direct Claude queries to EDGAR
  • Financial benchmarks — FinanceBench, StockBench, FinBen
  • Continuous velocity signals — hiring, product, GitHub feeds

9. Proposed Follow-ups

PriorityActionEffortImpact
P1Add Bull vs Bear debate pattern to TFTF scoring or thesis evaluationMHigh — proven to improve analysis quality
P1Enable edgartools MCP server for Claude sessionsSHigh — zero-code SEC data access
P2Cross-quarter commitment tracking in earnings analysis (did mgmt deliver on promises?)MHigh — differentiating analysis
P2JSON schema function-calling for structured earnings extraction (defeatbeta pattern)SMedium — reduces hallucination
P2Continuous velocity signal tracking (hiring, product launches, GitHub)LHigh — closes point-in-time gap
P2Evaluate rivus earnings analysis against FinanceBenchMMedium — calibrate quality
P3Evaluate against FinBen benchmark suite (42 datasets, 24 tasks)MMedium — systematic quality assessment

Open Questions

  1. Should the Bull vs Bear debate be a reusable primitive (lib/llm/debate.py) or a one-off in specific pipelines?
  2. Should rivus adopt a standard financial benchmark for self-evaluation, or is VIC backtesting sufficient?
  3. How much value do continuous velocity signals add over point-in-time web search? The data feeds have ongoing cost.
  4. Is edgartools MCP server production-ready, or does sec-edgar-mcp (AGPL-3.0) offer better functionality?

Generated by rivus autonomous research · 5 parallel research agents, 50+ web searches · Feb 26, 2026

TFTF (Too Fast To Follow)
Rivus's 6-dimension company scoring framework: Velocity, Compounding, Moat depth, Talent magnetism, Capital efficiency, Founder intensity. Grounded in web search, temperature=0 for consistent evaluation.