AI Stock Analysis Tools & Approaches

Landscape survey — 50+ repos, 15+ papers, 20+ platforms · February 2026

Key Takeaways

Rivus is ahead of open-source alternatives in earnings call analysis and founder evaluation. Nothing public matches transcript-to-price alignment or multi-API enrichment.
Bull vs Bear debate is the biggest adoptable pattern — TradingAgents (30.8K stars, paper-backed) outperforms single-agent analysis.
LLM agents don't generate trading alpha — StockBench (2025) confirms they fail to beat buy-and-hold. Value is in research synthesis and risk management.
edgartools MCP server is the lowest-hanging fruit — zero-code SEC data access from any Claude session.

1. Multi-Agent Stock Analysis Frameworks

The most active area. Multiple projects with 10K+ stars explore multi-agent architectures for financial analysis.

TradingAgents

30.8K ★ · High quality · LangGraph

Bull vs Bear researcher debate with risk mgmt as separate consensus layer. Paper-backed (arXiv:2412.20138). Supports GPT-5.x, Gemini 3.x, Claude 4.x, Grok 4.x, Ollama. v0.2.0 released Feb 2026.

AI Hedge Fund

45.9K ★ · Good · 18 agents

Investor-persona agents encoding real investment philosophies (Buffett, Munger, Burry, Cathie Wood). Well-coded but explicitly educational — not for real trading.

FinRobot

6.3K ★ · Good · AI4Finance

Cross-paradigm: LLM + RL + quant in one platform. Academic-grade, part of the AI4Finance ecosystem (FinGPT, FinRL). FinRobot Pro for professional equity research.

Dexter

Interesting · ~200 lines TS

3-agent pipeline (Action → Validation → Answer). The self-validation loop — agent checks its own output for accuracy before final answer — is the key innovation.

Recurring Architectural Patterns

Debate/adversarial — Bull vs Bear agents arguing before a decision maker
Specialist pipeline — Fundamental → Technical → Sentiment → Risk → Portfolio
Self-reflection — Agent reviews its own output before committing
Multi-source fusion — SEC filings + news + social + price data via specialist agents
Risk as a gate — Risk management agent with veto power

Rivus comparison: rivus's finance/earnings/backtest/annotate.py already does two-stage LLM annotation (fast triage → smart re-score). The multi-agent debate pattern is the biggest gap — could improve TFTFToo Fast To Follow — rivus's 6-dimension company scoring framework: Velocity, Compounding, Moat depth, Talent magnetism, Capital efficiency, Founder intensity. scoring in intel/companies/.

2. SEC Filings + LLM Extraction

Mature area with strong open-source tooling. The data access layer is largely solved; the analysis layer is where value remains.

Project	Stars	What It Does	Quality	Rivus Relevance
edgartools	1,700	Python EDGAR access + XBRL parsing. Ships with MCP server for Claude.	Exceptional 3,393 commits, 1K+ tests, MIT	High — already a rivus dep. MCP server unexploited
sec-insights (LlamaIndex)	2,600	Full-stack RAGRetrieval-Augmented Generation — LLM answers questions grounded in retrieved documents, reducing hallucination. for 10-K/10-Q Q&A with citation + PDF highlighting	Good Reference architecture	Medium — citation pattern useful
edgar-crawler	482	Downloads filings → extracts sections → structured JSON. Peer-reviewed (WWW 2025)	Good	Medium — NLP dataset building
sec-parser	273	Parses EDGAR HTML into semantic tree structure	Good CI, type checking, docs	Medium — RAG chunking
sec-edgar-mcp	212	MCPModel Context Protocol — Anthropic's standard for connecting AI assistants to external tools and data sources. server for direct AI assistant access to EDGAR	Strong 160 commits, AGPL-3.0	High — plug into Claude Code
Bellingcat EDGAR	193	CLI for EDGAR search with RSS monitoring for new filings	Clean	Medium — filing alerts

Benchmark: FinanceBench — 10,231 QA pairs, 40 companies. GPT-4 Turbo with retrieval: 19% correct. Best multi-agent system (Pathway LiveAI): 56%. SEC filing QA is a hard problem.

3. Earnings Call Analysis with LLMs

Surprisingly thin landscape. Most repos use old-school NLP (VADER, Loughran-McDonaldA sentiment dictionary specifically designed for financial text, where words like "liability" have neutral rather than negative meaning. dictionaries) rather than LLMs.

Project	Stars	Approach	Quality	Notes
defeatbeta-api	476	LLM function calling with JSON schema for structured extraction	Good	Most production-ready: schema-constrained extraction reduces hallucination. DuckDB + OpenAI.
Multimodal Earnings Platform	Low	Whisper + CV2 + BLIP-2 + Mistral + NER + NetworkX	Demo	Most ambitious: multimodal (audio+video+text)
FinRAGify	Low	RAG over 8 quarters, FAISS + CrossEncoder reranking	Interesting	Unique: cross-quarter commitment tracking — did mgmt deliver on promises?
Earnings Signal Extractor	Low	LLM sentiment + Q&A tone + QoQ trends	Demo	NVIDIA only. Segment-level sentiment (prepared vs Q&A)

Standout Academic Papers on Earnings Calls

Paper	Key Contribution
Agentic Topic Retrieval (Gupta 2025)	LLM agent discovers topics, builds evolving hierarchical ontology, tracks strategic priority shifts
Multi-Agent KPI Extraction (Choi 2025)	Extraction Agent + Text-to-SQL Agent: 95% accuracy on structuring filings, 91% on retrieval
MarketSenseAI 2.0	Chain-of-Agents + HyDE RAG: 125.9% cumulative returns on S&P 100 vs 73.5% index (2 years)

Prompt library: MLQ.ai 100+ Earnings Prompts — 14-category taxonomy. Best reference for multi-pass analysis chains.

Key gap: No production-quality open-source tool does what rivus does — real-time transcript-to-price alignment with two-stage LLM annotation at utterance-level granularity with IB tick-level (~250ms) data. The closest commercial equivalent is AlphaSense ($10K–50K/yr).

4. Financial LLMs & Benchmarks

Model / Benchmark	Type	Status	Key Finding
FinGPT	Fine-tuned LLM	18.5K ★ — Stale since Nov 2023	Llama2-era LoRA. Sentiment F1 87.6%. AI4Finance pivoted to FinRobot
BloombergGPT	Proprietary 50B	Closed — never released	The 40-year data moat is the asset, not the architecture
PIXIU / FinBen	Benchmark	834 ★ — Active	42 datasets, 24 tasks, 8 categories. NeurIPS 2024. Powers Open FinLLM Leaderboard
FinRL	RL framework	14K ★ — Active	NeurIPS 2020. Portfolio allocation, crypto, HFT. Research-grade
StockBench	Trading benchmark	Paper (2025)	Most LLM agents fail to beat buy-and-hold. Agents are biased bullish

Honest assessment: Nobody deploys FinGPT or FinMA in production. Frontier models (GPT-4o, Claude, Gemini, DeepSeek-R1) with financial prompting beat all dedicated financial fine-tunes on most benchmarks. The lasting contributions are the benchmarks (FinBen, FinanceBench), the fine-tuning datasets, and FinRL.

5. Velocity Scoring & Company Momentum

No open-source "velocity scoring" framework exists. The concept maps to commercial alternative data aggregation.

Platform	Type	What It Tracks
Rivus TFTF	In-house	6 dimensions: Velocity, Compounding, Moat depth, Talent magnetism, Capital efficiency, Founder intensity
Harmonic.ai	Commercial ($30M raised)	35M+ companies, 195M+ profiles. Funding, hires, product launches
Specter	Commercial	Web visits, app downloads, headcount growth, investor interest
Crustdata	Commercial API	Headcount (historical), web traffic, hiring signals, tech stack. Flat-rate unlimited
AltIndex	SaaS	AI Score 0–100: Reddit, social buzz, job postings, app downloads
TheirStack	Commercial	225K jobs/day from 104K sources. Tech adoption inferred from job postings

Build-Your-Own Velocity Score

Compose signals from APIs:

Hiring velocity — TheirStack or Crustdata headcount growth
Web traffic — Crustdata, SimilarWeb API
Funding signals — Crunchbase API, Harmonic
Social buzz — AltIndex, Reddit API, X API
Product signals — App Store rankings (Sensor Tower), GitHub activity
Tech adoption — BuiltWith, TheirStack job-posting inference

6. Startup / Founder Evaluation

Mostly commercial, very little open-source. Rivus's intel/ pipeline is more comprehensive than any OSS alternative.

Tool	Type	What It Does
VCII Founder Scorecard	Framework	Weighted quantitative + qualitative scoring, AI-enhanced
LEONVC	Commercial	Founder data + interviews + proprietary diagnostics. VC benchmarking
ReadyScore.ai	SaaS	AI startup investment-readiness scoring
Rivus intel/people/	In-house	Face search, LinkedIn mapping, enrichment from 10+ APIs, web presence measurement

Key stats: AI reduces due diligence time by ~60%. XGBoost screening outperformed median VC by 25%. AI adopters report 30–50% increase in high-quality deal flow.

7. Key Research Papers (2024–2026)

Paper	Year	Key Finding
TradingAgents	2024	Multi-agent debate improves Sharpe ratio and reduces drawdown
StockBench	2025	Most LLM agents fail to beat buy-and-hold. Agents biased bullish
AI-Trader	2025	General intelligence ≠ trading capability
Financial Statement Analysis	2024	GPT-4 outperforms professional financial analysts on earnings prediction
Multi-Agent KPI Extraction	2025	95% accuracy structuring financial filings; 91% retrieval
MarketSenseAI 2.0	2025	Chain-of-Agents: 125.9% returns vs 73.5% index (2yr)
FinMem	2024	Layered memory architecture mimicking human trader cognition (ICLR Workshop)

Curated Meta-Resources

awesome-ai-in-finance (~3K ★) — best curated list
awesome-quant (~18K ★) — comprehensive quant finance
Finance-LLMs (~500 ★) — real-world LLM implementations
stock-top-papers (~800 ★) — KDD, AAAI, ACL, EMNLP

8. Rivus Positioning

Ahead

Earnings call analysis — transcript-to-price at utterance granularity, two-stage LLM annotation
Founder/CEO evaluation — face search, LinkedIn mapping, 10+ API enrichment
VIC thesis backtesting — 25K ideas, 92% symbol hit rate, 7 horizons

At Par

Company TFTF scoring — 6-dimension framework, web-search grounded
SEC filing access — edgartools dependency in intel/ pipeline

Could Adopt

Multi-agent debate — Bull vs Bear pattern from TradingAgents
SEC filing MCP server — direct Claude queries to EDGAR
Financial benchmarks — FinanceBench, StockBench, FinBen
Continuous velocity signals — hiring, product, GitHub feeds

9. Proposed Follow-ups

Priority	Action	Effort	Impact
P1	Add Bull vs Bear debate pattern to TFTF scoring or thesis evaluation	M	High — proven to improve analysis quality
P1	Enable edgartools MCP server for Claude sessions	S	High — zero-code SEC data access
P2	Cross-quarter commitment tracking in earnings analysis (did mgmt deliver on promises?)	M	High — differentiating analysis
P2	JSON schema function-calling for structured earnings extraction (defeatbeta pattern)	S	Medium — reduces hallucination
P2	Continuous velocity signal tracking (hiring, product launches, GitHub)	L	High — closes point-in-time gap
P2	Evaluate rivus earnings analysis against FinanceBench	M	Medium — calibrate quality
P3	Evaluate against FinBen benchmark suite (42 datasets, 24 tasks)	M	Medium — systematic quality assessment

Open Questions

Should the Bull vs Bear debate be a reusable primitive (lib/llm/debate.py) or a one-off in specific pipelines?
Should rivus adopt a standard financial benchmark for self-evaluation, or is VIC backtesting sufficient?
How much value do continuous velocity signals add over point-in-time web search? The data feeds have ongoing cost.
Is edgartools MCP server production-ready, or does sec-edgar-mcp (AGPL-3.0) offer better functionality?

Generated by rivus autonomous research · 5 parallel research agents, 50+ web searches · Feb 26, 2026

TFTF (Too Fast To Follow): Rivus's 6-dimension company scoring framework: Velocity, Compounding, Moat depth, Talent magnetism, Capital efficiency, Founder intensity. Grounded in web search, temperature=0 for consistent evaluation.