The case for pre-accumulated domain data as the foundation of useful AI.
Every frontier model now offers a "deep research" mode: give it a question, it searches the web, reads dozens of pages, synthesizes an answer. It feels impressive. But for serious domain work, it falls short in predictable ways:
Core thesis
The bottleneck isn't the model's reasoning — it's the data the model has access to. Give a frontier LLM deep, curated, domain-specific data and its output goes from "interesting summary" to "actionable intelligence."
We aim to have the best data in each vertical we enter. Not just web scraping — a layered data strategy that deepens over time:
Phase 1 — Now
Phase 2 — With funding
Phase 3 — With scale
Each phase compounds on the previous. Licensed data fills gaps the web can't reach. Expert knowledge adds the context that neither web nor databases capture. The LLM reasons over all three layers simultaneously.
The key insight: you don't need to boil the ocean. Pick a vertical small enough to achieve data completeness, but valuable enough that completeness matters.
In a well-defined vertical, the system can know every relevant organization, every funder, every open RFP, every peer, every competing initiative. That's not a research assistant — that's an unfair advantage.
Why verticals win
A horizontal AI tool helps a little with everything. A vertical AI tool — with complete domain data — is transformatively better at the things that matter most to its users. The value per user is 10x higher, which means willingness to pay is 10x higher.
Live partnership — Technovation
We partnered with Technovation, a global nonprofit that teaches girls to build technology to solve community problems. (Disclosure: Tara Chklovski, the founder and CEO, is my wife.)
The vertical: educational nonprofits — organizations that deliver learning programs, seek grants, report to funders, and compete for limited philanthropic dollars.
What becomes possible with deep vertical data:
For the nonprofit
For funders
None of this is possible with generic deep research. It requires pre-accumulated data: every org cataloged, every funder mapped, every RFP tracked. The AI becomes useful precisely because the data is already there when the question is asked.
Educational nonprofits is one vertical. The pattern works anywhere the data can be bounded and the users are underserved by generic tools:
The playbook is the same each time: accumulate the data, structure it, let the LLM reason over the complete picture. The vertical specificity is what makes the output worth paying for.
Summary
Deep research from scratch hits a ceiling because the data isn't there. We build the data layer first — web now, licensed and expert data with funding — so the LLM has the complete picture before the user even asks. Small verticals, deep data, outsized value.