Founder & Network Research Plan

People Intelligence Project — Mapping the LA AI ecosystem and personal networks

Generated: 2026-02-06  |  Project: projects/people/

1. Goal & Strategy

Objective: Map the network of people around Tim & Tara Chklovski and TenOneTen Ventures (Gil Elbaz, David Waxman, Eric, Minnie), cross-referenced with the LA AI startup ecosystem. Surface who knows whom, how they're connected, and where the warm intros live.

The approach is data-first: get real LinkedIn profile data, analyze it, and build the network from what we actually find — rather than over-engineering a product before seeing what the data supports.

1,343
LA AI companies identified
~200-500
Profiles to scrape (Phase 1)
$1-5
Estimated total cost
3-5 days
To working network graph

2. Hard Constraint: LinkedIn Connections

LinkedIn connections are unavailable from ANY scraper

LinkedIn's Connections API requires OAuth consent from each individual user. No third-party — not Bright Data, not Apify, not anyone — can extract who someone is connected to on LinkedIn. Proxycurl (formerly the main alternative) was shut down by LinkedIn lawsuit in July 2025.

What IS available vs. what ISN'T

DataAvailable?SourceNotes
Full employment history with dates Yes Profile scrape Company, title, start/end dates for every job
Education with dates Yes Profile scrape School, degree, field, year ranges
Current company & title Yes Profile scrape "Present" end date = currently there
Skills, about section, posts Yes Profile scrape Full text content
Connection list (who they know) No Requires individual OAuth. No scraper has this.
Connection date (when connected) Self-export only User's own CSV export Settings → "Download your data" → Connections.csv
Profile changes over time Re-crawl only Periodic re-scrape Diff between scrapes shows job changes, new roles

The Workaround: Two Complementary Methods

Method A: Self-Export (Gold Path)

Each person exports their own LinkedIn connections:

LinkedIn → Settings → Get a copy of your data → Connections

Gives: name, email, company, position, connected date (the recency signal).

Takes 30 seconds per person. This is the only way to get actual connection lists with dates.

Method B: Inferred Network (Automated)

Scrape profiles of known contacts, then infer connections from:

Edge weighting: 50-person startup overlap = strong. Google overlap = weak (unless same team/location).

3. Phase 1: Personal Network Extraction

Starting with the people closest to us. Priority order for TenOneTen:

PriorityPersonAffiliationMethodExpected Contacts
1 Tim Chklovski Personal Self-export Connections.csv 500-2,000+
1 Tara Chklovski Personal Self-export Connections.csv 500-2,000+
2 Gil Elbaz TenOneTen Ventures Ask for export (or infer from profile scrape) 1,000-5,000+
3 David Waxman TenOneTen Ventures Ask for export (or infer) 1,000-3,000+
4 Eric TenOneTen Ventures Ask for export (or infer) 500-2,000
5 Minnie TenOneTen Ventures Ask for export (or infer) 500-2,000

Data Captured Per Contact

Every contact record in SQLite includes:

FieldSourceExample
nameCSV / scrapeJane Smith
companyCSV / scrapeOpenAI
titleCSV / scrapeVP Engineering
emailCSV export onlyjane@openai.com
connected_dateCSV export only2024-03-15
whose_contactImport metadatagil_elbaz
extraction_methodImport metadatalinkedin_csv_export
extraction_dateImport metadata2026-02-06
linkedin_urlCSV / scrapelinkedin.com/in/janesmith

Processing Pipeline

Connections CSVs Profile Scrapes | | v v Import to SQLite Import to SQLite (name, company, title, (full employment history, email, connected_date, education, skills, posts) whose_contact, method) | | +----------+---------------------+ | v Deduplicate across contact lists (who do Tim AND Gil both know?) | v Scrape unique profiles (~200-500 profiles via BD/Apify) | v Build employment/education graph (shared history = inferred connections)

4. Phase 2: LA AI Ecosystem

Cross-reference personal contacts with the 1,343 AI companies identified in Greater Los Angeles.

Dataset Summary

1,343
AI/ML companies in LA metro
487
With funding data
$33.5M
Mean funding (funded cos)
1,263
Active (vs. 80 closed)

Top LA AI Companies by Funding

#CompanyFocusFundingEmployeesFounded
1Faraday FutureAI electric vehicles$4.08B1,001-5,0002014
2AlteryxData science & analytics$1.41B1,001-5,0002011
3Relativity Space3D-printed rockets + AI$1.36B501-1,0002015
4System1Predictive analytics$920M101-2502013
5VercelFrontend cloud + AI$563M251-5002015
6FLYRAI for airlines$482M501-1,0002013
7VeritoneEnterprise AI platform$393M501-1,0002014
8CylanceAI cybersecurity$297M501-1,0002012
9GeniesAI avatar tech$267M101-2502011
10ReveleerHealthcare NLP$209M101-2502009

Notable LA AI Companies (Web Research)

CompanyLocationFocusNotes
Anduril IndustriesCosta MesaDefense AI, autonomous systems$6.84B raised, $78B valuation. Palmer Luckey. ~6K employees.
HeyGenLos AngelesAI video generation$60M Series A, ~$95M ARR. Hot growth.
GrayMatter RoboticsCarsonAI manufacturing robots$45M Series B. BuiltInLA "Best Startup" 2026.
Avenda HealthCulver CityCancer detection AI (FDA-cleared)Multi-modal AI for 3D cancer maps.
Beyond LimitsGlendaleIndustrial AINASA/JPL heritage. 45 space technologies adapted.
VirtualiticsPasadenaAI data analyticsCaltech spinout. DOD + Fortune 500.
Snap Inc.Santa MonicaAR/AI, computer visionPublic. 5K employees. Heavy AI investment.
MetropolisLos AngelesComputer vision paymentsLargest parking network in N. America. 50M+ customers.

By Sub-Region

AreaNotable Companies
Santa MonicaSnap, Axle Health, GumGum, Zeitview, Emerge Tools
Venice / Mar VistaAE Studio
Culver CityAvenda Health
Playa VistaGoogle campus, various startups
El SegundoHABIT (YC), Dogtown Media, Plug&Play AI
PasadenaVirtualitics, Deep 6 AI, Supplyframe, Caltech/JPL ecosystem
GlendaleBeyond Limits (NASA/JPL heritage)
Costa MesaAnduril Industries

Size Distribution

Employee RangeCount% of Total
1-1059644%
11-5047736%
51-100836%
101-250443%
251-500181%
501+181%

5. Phase 3: Network Graph & Analysis

Once we have contact lists + scraped profiles, build the network and answer:

Edge Types & Strength

Connection TypeStrengthSourceRecency Signal?
Direct contact (from CSV export) Known LinkedIn export Yes — connected_date in CSV
Currently at same company Strong Profile scrape ("Present") Yes — ongoing
Shared employer, recent (<2yr ago) Strong Profile scrape (dates) Yes — date ranges
Shared employer, older (2-5yr) Medium Profile scrape (dates) Yes — computable
Shared employer, historical (>5yr) Weak Profile scrape (dates) Yes — computable
Shared education (overlapping years) Medium Profile scrape (dates) Historical only
Co-investment in same company Strong Crunchbase / SEC Yes — funding dates
Board co-membership Strong SEC DEF 14A Yes — filing dates

Critical: Edge Weighting by Org Size

Two people who both worked at a 50-person startup = very likely know each other.

Two people who both worked at Google (200K employees) = probably never met unless we can confirm same team/office/dates.

The graph must weight edges by organization size at the time of overlap to avoid false positives.

6. LinkedIn Scraping: Options & Alternatives

Provider Comparison

ProviderCost/ProfileReliabilitySetupBest ForNotes
Bright Data Best API key + MCP 1,000+ profiles at scale Largest proxy network. 544M+ profiles. MCP already installed in rivus. Promo: APIS25 (25% off 6mo).
Apify Good Apify account + actor Quick batches of 100-1,000 Easiest to start. "LinkedIn Profile Scraper" actor. Pay per compute unit. No proxy management needed.
ScrapingBee Decent API key Medium batches JS rendering support. LinkedIn-specific endpoint.
People Data Labs Aggregated API key Enrichment, not real-time Pre-compiled data. Free tier: 100 records/mo. May be months stale.
PhantomBuster Risky Your LinkedIn login Avoid Uses YOUR LinkedIn credentials. Account ban risk. Not recommended.
DIY (Playwright + proxies) Fragile Code + proxy subscription Cheapest but most work Residential proxies ~$0.01/hr. LinkedIn actively blocks. High maintenance.

Recommendation

For our initial batch (~200-500 profiles):

Cost at 500 profiles: Bright Data = $0.75-1.25  |  Apify = $5-15

What None of Them Can Do

No LinkedIn scraper — paid or free — can extract:

These all require the person's own OAuth token. The only way to get connection data is the self-export CSV method described above.

7. Data Schema

SQLite Tables (people.db)

contacts profiles id INTEGER PRIMARY KEY id INTEGER PRIMARY KEY name TEXT NOT NULL linkedin_url TEXT UNIQUE email TEXT name TEXT company TEXT headline TEXT title TEXT location TEXT linkedin_url TEXT about TEXT connected_date TEXT current_company TEXT whose_contact TEXT NOT NULL current_title TEXT extraction_method TEXT skills TEXT (JSON array) extraction_date TEXT posts TEXT (JSON array) created_at TIMESTAMP scraped_at TIMESTAMP scrape_source TEXT employment_history education_history id INTEGER PRIMARY KEY id INTEGER PRIMARY KEY profile_id INTEGER FK profile_id INTEGER FK company TEXT school TEXT title TEXT degree TEXT start_date TEXT field TEXT end_date TEXT start_year INTEGER location TEXT end_year INTEGER is_current BOOLEAN edges (inferred connections) la_ai_companies id INTEGER PRIMARY KEY id INTEGER PRIMARY KEY person_a_id INTEGER FK name TEXT person_b_id INTEGER FK description TEXT edge_type TEXT categories TEXT strength REAL (0-1) location TEXT shared_entity TEXT funding_total REAL overlap_start TEXT employees TEXT overlap_end TEXT website TEXT org_size_at_time INTEGER linkedin TEXT notes TEXT founded TEXT

8. Parallel Execution Timeline

Day 0 — All parallel
Day 1 — CSVs arrive + scrape prep
Day 2 — Scrape + first results
Day 3-4 — Graph & analysis
Day 5 — UI & demo

9. Budget

ItemCostDetails
LinkedIn Connections CSV exports$0Self-export by each person
LA AI companies data (Crunchbase)$0Already downloaded (1,343 companies)
Free data sources (Forbes, SEC, Signal, OpenVC)$0Already collected (60K+ records)
Profile scrape: 200-500 contacts (Bright Data)At $0.0025/profile
Profile scrape: 200-500 contacts (Apify alt)At $0.01-0.03/profile
Crunchbase/SEC cross-reference$0Existing data
Total (Bright Data path) $0.50-1.25
Total (Apify path) $2-15

10. Risks & Mitigations

RiskImpactMitigation
People don't export their CSVs promptly Medium Start with whoever exports first. Build pipeline with Tim/Tara's data, add others as they arrive.
Noisy inferred network (everyone went to Stanford) Medium Weight edges by org size at time of overlap. Small company = strong. Large = weak unless same team.
Bright Data cost overrun Low Hard API spend limit at $5. Test 10 profiles before bulk. Initial batch is tiny (~$1).
Identity resolution (duplicate Smiths) Medium Match on (Name + Company) or (Name + School). If uncertain, keep both — don't merge aggressively.
LinkedIn scraping blocked/degraded Low Bright Data handles anti-bot. Apify as fallback. Test first.
Data staleness over time Low Monthly refresh of key profiles ($0.0005/profile). Track scraped_at date per record.

11. Founder Evaluator (Future Phase)

Once the network data is solid, layer on the founder evaluation product:

Profile Card

Background, trajectory, education, prior exits, technical depth. Auto-generated from scraped profile data.

Network Map

Interactive visualization of connections to VCs, founders, and LA AI companies. Pyvis / Plotly.

Founder Score (MVP: 3 dims)

Prior startup success, network quality, technical depth. Backtested against YC acceptances.

Warm Intro Paths

"Tim → [shared contact] → [target person]" with relationship strength and recency.

Full spec: projects/people/README.md  |  Data: projects/people/data/  |  Source: rivus