Deep Research From Scratch Is Not Deep Enough

The case for pre-accumulated domain data as the foundation of useful AI.

1. The problem with "deep research"

Every frontier model now offers a "deep research" mode: give it a question, it searches the web, reads dozens of pages, synthesizes an answer. It feels impressive. But for serious domain work, it falls short in predictable ways:

Shallow sourcing. It finds what Google surfaces today, not what an expert has accumulated over years.
No private data. Licensed databases, internal reports, expert networks — none of it is in the search index.
No accumulation. Tomorrow's query starts from zero. There's no compounding knowledge base.
Generic framing. The model doesn't know your vertical's terminology, key players, funding landscape, or evaluation criteria.

Core thesis

The bottleneck isn't the model's reasoning — it's the data the model has access to. Give a frontier LLM deep, curated, domain-specific data and its output goes from "interesting summary" to "actionable intelligence."

2. The data advantage

We aim to have the best data in each vertical we enter. Not just web scraping — a layered data strategy that deepens over time:

Phase 1 — Now

Web at scale
Autonomous pipelines crawl, extract, structure
Self-healing ingestion (proxy escalation, error triage)
Multi-source cross-referencing

Phase 2 — With funding

Licensed databases
Industry reports, proprietary datasets
Government and regulatory filings
Subscription data feeds

Phase 3 — With scale

Expert networks
Domain practitioners contributing knowledge
Verified facts, insider context
Human-in-the-loop curation

Each phase compounds on the previous. Licensed data fills gaps the web can't reach. Expert knowledge adds the context that neither web nor databases capture. The LLM reasons over all three layers simultaneously.

3. Small vertical, outsized impact

The key insight: you don't need to boil the ocean. Pick a vertical small enough to achieve data completeness, but valuable enough that completeness matters.

In a well-defined vertical, the system can know every relevant organization, every funder, every open RFP, every peer, every competing initiative. That's not a research assistant — that's an unfair advantage.

Why verticals win

A horizontal AI tool helps a little with everything. A vertical AI tool — with complete domain data — is transformatively better at the things that matter most to its users. The value per user is 10x higher, which means willingness to pay is 10x higher.

4. Example: educational nonprofits

Live partnership — Technovation

We partnered with Technovation, a global nonprofit that teaches girls to build technology to solve community problems. (Disclosure: Tara Chklovski, the founder and CEO, is my wife.)

The vertical: educational nonprofits — organizations that deliver learning programs, seek grants, report to funders, and compete for limited philanthropic dollars.

What becomes possible with deep vertical data:

For the nonprofit

Every relevant funder — foundations, government programs, corporate giving — mapped, scored, matched
Every open RFP — monitored in real time, with fit scoring against the org's mission and capabilities
Peer landscape — who else does similar work, where are the gaps, what's the competitive positioning
Grant writing — drafts that know the funder's priorities, the org's track record, and the vertical's language

For funders

All educational nonprofits — comprehensive landscape of who's doing what, where, at what scale
Due diligence — financial health, leadership stability, outcome data, regulatory status — automated
Gap analysis — where is funding concentrated, where are underserved regions or populations
Portfolio monitoring — track grantee progress, surface risks, flag opportunities for follow-on

None of this is possible with generic deep research. It requires pre-accumulated data: every org cataloged, every funder mapped, every RFP tracked. The AI becomes useful precisely because the data is already there when the question is asked.

5. The pattern generalizes

Educational nonprofits is one vertical. The pattern works anywhere the data can be bounded and the users are underserved by generic tools:

Semiconductor supply chain — every company, supplier relationship, bottleneck (we already have 500+ companies mapped)
VC deal sourcing — every startup in a thesis, every founder, every funding round, every competitor
Clinical trials — every trial, site, investigator, regulatory filing in a therapeutic area
Local government procurement — every RFP, every vendor, every contract, every compliance requirement

The playbook is the same each time: accumulate the data, structure it, let the LLM reason over the complete picture. The vertical specificity is what makes the output worth paying for.

Summary

Deep research from scratch hits a ceiling because the data isn't there. We build the data layer first — web now, licensed and expert data with funding — so the LLM has the complete picture before the user even asks. Small verticals, deep data, outsized value.