Vertical Knowledge Portals

Turn any content library into a searchable, structured knowledge base. Every video, transcript, and concept — indexed, linked, and instantly findable.

1. The problem with video content

YouTube channels, podcast archives, lecture series, conference talks — they contain thousands of hours of expert knowledge. But that knowledge is trapped:

Unsearchable

You can't search inside videos. YouTube search finds titles, not the moment where the expert explains the concept you need.

Unstructured

A 3-hour podcast covers 40 topics. There's no table of contents, no concept index, no way to jump to the relevant 2 minutes.

Unconnected

The same concept appears across 15 episodes. No one has mapped those connections. Each video is an island.

2. What a knowledge portal does

We process every video in a channel — transcribe, chunk by topic, extract concepts, and generate a searchable static site. The result: an entire channel's knowledge, structured and searchable.

DISCOVER channel videos TRANSCRIBE speaker + timing CHUNK 60s windows ENRICH concepts + chapters GENERATE static HTML portal YouTube API VTT files SQLite DB LLM analysis Searchable site

3. Working demo

Two portals are live today, generated from the same generic framework.

healthygamer.localhost/portal/

Healthy Gamer Portal

50
Videos
2,433
Chunks
40h 16m
Duration
Dr. K Chats with AsmonTV about His Fear of Death2h 13m · 134 chunks
Why You Feel Behind in Life48m · 52 chunks
The Epidemic of Loneliness1h 02m · 61 chunks
How Social Media Rewires Your Brain55m · 47 chunks

Healthy Gamer (HealthyGamerGG)

  • 50 videos processed
  • 2,433 chunks indexed
  • 40+ hours of content searchable
  • Full-text search via lunr.js
  • Timestamp links to exact YouTube moments

a16z (Andreessen Horowitz)

  • 5 videos processed
  • 127 chunks indexed
  • 2+ hours of content searchable
  • Same framework, different channel
  • Generated with one command

Each portal is a static site — no server required. Host anywhere. The search index loads client-side. A new channel goes from zero to portal with a single command.

4. What makes this a product

The portal is generic infrastructure. The value is what you build on top of it for a specific vertical.

For content creators

  • Unlock the back catalog — viewers find relevant moments across hundreds of videos, not just the latest upload
  • SEO surface area — every chunk becomes a searchable, linkable page
  • Concept navigation — "show me every time this channel discussed X"
  • Membership perk — premium searchable knowledge base for subscribers

For organizations

  • Training libraries — turn internal video into searchable knowledge
  • Conference archives — every talk, indexed and cross-referenced
  • Research corpus — academic lectures, seminars, lab meetings
  • Compliance — full-text search across recorded meetings

5. Architecture: generic base, thin wrappers

Adding a new channel takes minutes, not days. The framework is parameterized — you supply the channel name, it does the rest.

Add a new channel

# Process transcripts into chunks

python -m lib.semnet.pipeline process dwarkesh

# Generate the portal

python -m lib.semnet.portal generate dwarkesh --title "Dwarkesh Podcast"

# Done. Static site at projects/dwarkesh/portal/

Key insight

The framework handles the plumbing — chunking, indexing, search, HTML generation, timestamp linking. Each new vertical just supplies the content source. This is the same "small vertical, deep data" pattern from the data depth thesis, applied to video content.

6. Roadmap

Now

  • Static portals with search
  • Timestamp-linked chunks
  • Chapter grouping
  • 2 live demos

Next

  • Concept extraction — auto-tag topics, build concept index
  • Cross-video linking — "same topic, different episode"
  • Semantic search — vector embeddings, not just keyword
  • More channels — Dwarkesh, Lex Fridman

Vision

  • AI Q&A layer — "ask the channel" with cited timestamps
  • Multi-source portals — video + blog + podcast unified
  • White-label — creators embed in their own site
  • API access — programmatic search across portals

The compound effect

Every portal makes the framework better. Chunking heuristics improve. Concept extraction gets smarter. The learning system captures what works and feeds it back. Portal #50 will be significantly better than portal #1 — with zero additional engineering.