# Related Work: Alternative Data for Financial Trend Tracking

*What do people track beyond Google Trends for investment signals?*

Compiled 2026-03-24 via web search (automated pipeline had Gemini subscription failures).

---

## Build Priority for a Small Team

**Tier 1 — Free/cheap, high signal, build now:**

| Signal                 | Cost       | DIY Feasibility | Why                                              |
|------------------------|------------|-----------------|--------------------------------------------------|
| Earnings call NLP      | Free       | HIGH            | FinBERT + free transcripts; sentiment delta predicts next-day returns |
| Insider trading        | Free       | HIGH            | OpenInsider + edgartools; cluster buying is well-documented bullish signal |
| Search trends          | Free-$40/mo | HIGH          | Google Trends + Glimpse; Preis et al. showed 310% outperformance over 7yr |
| SEC filing NLP         | Free       | HIGH            | edgartools + FinBERT on 10-K risk factors; language changes precede price |
| Social sentiment       | Free       | HIGH            | Reddit/StockTwits via PRAW + FinBERT; volume matters more than sentiment score |
| Gov't data pipelines   | Free       | HIGH            | FRED/BLS/EIA — authoritative, API-accessible, massive coverage |

**Tier 2 — Modest investment ($30-$500/mo):**

| Signal                 | Cost       | DIY Feasibility | Why                                              |
|------------------------|------------|-----------------|--------------------------------------------------|
| Options flow           | $30/mo     | MEDIUM          | Unusual Whales; UOA front-runs earnings/M&A      |
| Patent filing trends   | Free       | MEDIUM          | PatentsView API; filing spikes signal 12-18mo before launches |
| AIS shipping data      | Free       | MEDIUM          | AISHub; requires engineering but unique macro signal |

**Tier 3 — Evaluate commercial ($10K+/yr):**

| Signal                 | Cost       | DIY Feasibility | Why                                              |
|------------------------|------------|-----------------|--------------------------------------------------|
| Web/app traffic        | $10K+/yr   | LOW             | SimilarWeb/Apptopia; direct revenue proxy for digital companies |
| Job posting data       | $25K+/yr   | MEDIUM          | Revelio Labs/LinkUp; hiring velocity is leading indicator |
| Foot traffic           | Enterprise | LOW             | Placer.ai; direct retail revenue proxy           |

**Tier 4 — Institutional only ($50K+/yr):**

| Signal                 | Cost       | Why                                              |
|------------------------|------------|--------------------------------------------------|
| Transaction data       | $50-500K/yr | Bloomberg Second Measure, Facteus — most direct revenue signal |
| Satellite analytics    | $100K+/yr  | Orbital Insight, SpaceKnow — unique, uncorrelated |
| Enterprise ESG         | Enterprise | MSCI, Sustainalytics — but low inter-provider agreement |

---

## Detailed Categories

### 1. Search Trends (current gtracker focus)

Google Trends is validated but noisy. Key enhancements:

| Tool               | What it adds over raw GT                                 | Cost          |
|--------------------|----------------------------------------------------------|---------------|
| **Glimpse**        | Absolute volumes, 12-month forecasts (87% accuracy)      | Free extension + paid |
| **Exploding Topics** | AI-curated emerging trends before mainstream            | $39-$249/mo   |
| **SparkToro**      | Audience research: what people read/follow/search         | Free (20/mo)  |

**Academic validation:**
- Preis et al. (2013, Nature): GT strategy outperformed market by 310% over 7 years
- Bijl et al. (2016): Search volume predicts individual stock returns
- Systematic review of 56 articles on GSVI + investor attention (Springer 2023)

### 2. Social Media Sentiment

Volume of mentions often matters more than sentiment scores.

| Provider              | Type                | Access         |
|-----------------------|---------------------|----------------|
| **FinBERT**           | BERT fine-tuned on financial text | Free (HuggingFace) |
| **StockTwits**        | Bull/bear-tagged stock microblogging | Free feed     |
| **PRAW**              | Reddit API (r/WallStreetBets) | Free          |
| **Quiver Quantitative** | Reddit + Congress trades + lobbying | Free tier   |
| **RavenPack**         | Enterprise NLP on news + social | Enterprise    |

**Key finding:** Reddit WSB back-tests showed 70-84% outperformance in bull markets. Simpler metrics (comment volume) often beat sophisticated NLP.

### 3. Web Traffic / App Data

Direct revenue proxy for digital companies, but expensive.

| Provider          | Coverage                    | Notes                          |
|-------------------|-----------------------------|--------------------------------|
| **SimilarWeb**    | 100M+ websites, 4M+ apps   | Strongest breadth; $10K+/yr    |
| **Sensor Tower**  | Mobile apps (acquired data.ai) | Consolidated mobile market  |
| **Apptopia**      | 3,500+ public tickers       | 4-6 week earnings lead signal  |

Free alternatives: very limited (app store scraping, Cloudflare Radar for aggregate trends).

### 4. Credit Card / Transaction Data

The #1 ranked alt-data category for alpha, but zero free alternatives.

| Provider                | Panel              | Lag    |
|-------------------------|--------------------|--------|
| **Bloomberg Second Measure** | 20M+ consumers | 3-day  |
| **Earnest Research**    | Undisclosed        | ~3 day |
| **Facteus**             | 185M+ cards        | 2-day  |
| **YipitData**           | Varies             | Daily  |

Closest free proxy: FRED consumer spending (aggregate only, monthly).

### 5. Satellite / Geospatial

Unique, uncorrelated signals — but requires ML expertise for DIY.

| Provider            | Focus                                           |
|---------------------|-------------------------------------------------|
| **Orbital Insight**  | Oil storage (25K tanks), retail car counts, ports |
| **SpaceKnow**       | Global trade index from port imagery             |
| **Planet Labs**      | Daily global imagery (200+ satellites)            |
| **RS Metrics**       | Metal storage, parking lots, solar panels         |

Free imagery: Sentinel-2 (10m, 5-day revisit), Landsat (30m), Google Earth Engine. Feasible to build parking lot car counters with Sentinel-2 + YOLO but resolution is limiting.

### 6. Supply Chain / Shipping

80%+ of international trade moves by sea — AIS data is powerful macro signal.

| Provider             | Focus                              |
|----------------------|------------------------------------|
| **FreightWaves SONAR** | Trucking/rail/air/ocean; $125B in invoices tracked |
| **MarineTraffic (Kpler)** | 13K+ AIS receivers globally   |
| **Windward**         | Maritime AI risk + compliance       |

Free: AISHub (community AIS data), OECD maritime dashboard, UN Comtrade, FRED freight indices.

### 7. Job Posting Signals

Hiring velocity is a leading indicator. Fed policy increasingly influenced by alt jobs data.

| Provider          | Coverage                                |
|-------------------|-----------------------------------------|
| **Revelio Labs**  | 4.1B+ postings, 6.6M companies         |
| **LinkUp**        | Direct from 67K+ employer websites, 195 countries |
| **Indeed Hiring Lab** | Free aggregate reports (no ticker-level) |

Free: BLS JOLTS (monthly aggregate), WARN Act notices (state-by-state scraping), CommonCrawl job pages.

### 8. Government / Regulatory (SEC, BLS, EIA, FRED)

Free, authoritative, but often lagging. NLP on filings extracts sentiment shifts.

| Source              | Type                  | Access  |
|---------------------|-----------------------|---------|
| **SEC EDGAR**       | All US public filings | Free    |
| **edgartools**      | AI-native EDGAR Python library | Free |
| **FRED**            | 800K+ economic time series | Free API |
| **BLS**             | Employment, CPI, PPI, JOLTS | Free |
| **EIA**             | Energy production/pricing | Free  |
| **AlphaSense**      | AI search across filings | Enterprise |

### 9. Patent / IP Signals

Filing spikes signal emerging tech bets 12-18mo before product launches.

Free via PatentsView API (USPTO), Google Patents, Lens.org, WIPO. Commercial: Patsnap, Lighthouse IP.

### 10. Options / Dark Pool

Reveals institutional positioning before it appears in price.

| Provider            | Cost      | Notes                                     |
|---------------------|-----------|-------------------------------------------|
| **Unusual Whales**  | $30/mo    | Options flow + dark pool + Congress trades |
| **FlowAlgo**        | ~$100/mo  | Algorithmic dark pool filtering            |
| **FINRA ATS data**  | Free      | Weekly dark pool volumes (2-4 week delay)  |

### 11. Insider Trading

SEC Form 4 filings — cluster buying is a well-documented bullish signal.

All free: OpenInsider, SECForm4.com, edgartools for programmatic access. One of the most accessible alt-data categories.

### 12. ESG Data

Low inter-provider agreement is the elephant in the room. RepRisk (NLP-based) is theoretically replicable with open-source tools.

### 13. Earnings Call NLP

**Highest-ROI DIY project.** Pipeline: scrape transcript → chunk → FinBERT → compute sentiment delta vs. previous quarter. 45-50% of companies that beat EPS still trade lower when tone is negative.

### 14. Foot Traffic / Mobility

Placer.ai, Unacast, Foursquare. All enterprise-priced ($50K+/yr). Free alternatives are mostly discontinued (Apple/Google mobility reports). Google Popular Times scrapable but gives relative busyness only.

---

## Open Source Platforms

| Platform                    | Stars | Description                                    |
|-----------------------------|-------|------------------------------------------------|
| **OpenBB**                  | 35K+  | Free open-source financial data platform; modular, API-first |
| **awesome-quant**           | 18K+  | Curated list of quant finance libraries         |
| **Quiver Quantitative**     | —     | Reddit sentiment, Congress trades, insider activity |
| **edgartools**              | —     | AI-native EDGAR library; XBRL + 17 form types  |
| **FinBERT** (ProsusAI)     | —     | Financial sentiment BERT on HuggingFace         |

---

## Key Academic References

| Paper                                     | Signal         | Finding                                        |
|-------------------------------------------|----------------|------------------------------------------------|
| Preis et al. (2013), Nature               | Google Trends  | GT strategy outperformed market by 310% over 7yr |
| Bijl et al. (2016), ScienceDirect         | Google search  | Search volume predicts individual stock returns  |
| PeerJ (2023)                              | StockTwits+FinBERT | FinBERT+SVM ensemble predicts stock movement |
| J.P. Morgan (2024)                        | All alt-data   | Hedge funds using alt-data see 3% higher annual returns |
| Financial Innovation (Springer 2024)      | All alt-data   | Comprehensive review of alt-data in finance     |

---

## Market Context

Global alternative data market: $11-14B (2025), projected >$19B by 2030. 86% of investment managers plan to increase alt-data usage. 98% agree traditional data is too slow. Convergence of cheaper compute + better NLP (FinBERT, LLMs) + data accessibility makes this increasingly viable for smaller teams.

---

## Relevance to gtracker

The current trend tracker (Google Trends → composite indices) is a good foundation. Natural expansions:

1. **Immediate** (free, same architecture): SEC insider buying, WARN Act notices, BLS JOLTS, FRED macro indicators
2. **High-value add** (needs FinBERT): Earnings call sentiment, SEC filing risk factor NLP, Reddit/StockTwits sentiment
3. **Medium-term** ($30-100/mo): Unusual Whales options flow, PatentsView filings
4. **Aspirational**: Satellite imagery, credit card data, foot traffic (require enterprise budgets or partnerships)

The config.yaml + index computation pattern in `projects/trends/` generalizes well to all of these — most are time-series data that can form category/composite indices with the same infrastructure.
