Identity Clustering Demo

Disambiguating "Gil Elbaz" across 85 search results

85
Search Results
6
Identities Found
2
Chunks
4.2s
Clustering Time
41
Cached Sources

Pipeline

Serper + APIs
40+ parallel searches
Dedup
by URL
Chunk
≤50 items/chunk
LLM Cluster
gemini-flash per chunk
Embed & Merge
text-embedding-3-small
Match Target
cosine similarity
Hint: founder Factual, data company
Keywords: founder, factual, company  —  Name variants: Gil Elbaz, G. Elbaz, Gilbert Elbaz, Gilad Elbaz

Identity Clusters

Gil Elbaz (Factual founder, data entrepreneur & investor) Target Match
33 results
AdSense Applied Semantics Big Data Caltech Common Crawl Factual Los Angeles Natural Language Technology TenOneTen Ventures
Noise / not a person
44 results
Alber Elbaz Bill Ackman Dementia Facebook post Instagram post Kabarett LinkedIn profile Psychedelics Rebel Wilson TikTok video Unrelated name
Freema Elbaz (Educational researcher)
5 results
Teacher Thinking Practical Knowledge Narrative Curriculum Subject Knowledge
Gil Elbaz (Functional Medicine practitioner)
1 results
Functional Medicine Root cause Health practitioner Symptoms
Gil (Groom at a Jewish wedding)
1 results
Jewish wedding Salomé Marrakech Chuppah
Gil Elbaz (Python Developer & Bioinformatician)
1 results
Bioinformatics Bar-Ilan University Python Developer GitHub

How It Works

1. LLM Clustering (per chunk)

Each chunk of ≤50 items is sent to gemini-flash with a prompt asking it to group results by which real-world person they belong to. The LLM returns a label and distinctive keywords for each identity.

2. Embedding Merge (across chunks)

For large result sets that span multiple chunks, cluster descriptors (label + keywords) are embedded using text-embedding-3-small. Clusters with cosine similarity ≥ 0.80 are merged as the same person.

3. Target Matching

The search hint is embedded alongside all cluster descriptors. The cluster with highest cosine similarity to the hint is selected as the target identity. Its results get score 0.85 (confirmed); other-person results get 0.15; noise gets 0.05.

Cluster Similarity Matrix

Cosine similarity between cluster descriptors (embedding space):

Gil Elbaz (Factual founder, da... Gil Elbaz (Functional Medicine... Gil (Groom at a Jewish wedding... Freema Elbaz (Educational rese... Gil Elbaz (Python Developer & ...
Gil Elbaz (Factual founder, da...1.0000.5310.3200.3900.625
Gil Elbaz (Functional Medicine...0.5311.0000.3170.4350.569
Gil (Groom at a Jewish wedding...0.3200.3171.0000.2160.349
Freema Elbaz (Educational rese...0.3900.4350.2161.0000.419
Gil Elbaz (Python Developer & ...0.6250.5690.3490.4191.000