Data Visualizationalpha

Embedding Explorer

An interactive 3D visualiser for vector embeddings. Paste a corpus, pick an embedding model, watch UMAP project it into 3D, and click clusters to inspect what is inside. I built it as a debugging tool for RAG systems, because the fastest way to understand why a retrieval is bad is to look at the embeddings themselves. It turns into a teaching tool too: showing someone the cluster of 'shipping' tickets sitting next to 'returns' explains a vector database faster than a slide deck.

Three.jsUMAPOpenAI EmbeddingsReact

What this is

A lab, not a product.

An interactive 3D visualiser for vector embeddings. Paste a corpus, pick an embedding model, watch UMAP project it into 3D, and click clusters to inspect what is inside. I built it as a debugging tool for RAG systems, because the fastest way to understand why a retrieval is bad is to look at the embeddings themselves. It turns into a teaching tool too: showing someone the cluster of 'shipping' tickets sitting next to 'returns' explains a vector database faster than a slide deck.

5

Features

4

Learnings

4

Technologies

Capabilities

What it does

The features that actually got built and run in this prototype.

feature_01.ts
Upload or paste a corpus directly in the browser, with chunking presets
feature_02.ts
Choose embedding provider (OpenAI, Cohere, or a local model via Ollama)
feature_03.ts
UMAP projection in the browser using web workers, so the UI never freezes
feature_04.ts
Hover and click clusters to inspect raw content and similarity scores
feature_05.ts
Export labelled clusters as JSON, pair with the RAG blueprint

The stack

What it is built with

The libraries and runtimes I picked for this lab and why they earned their place.

Three.js
UMAP
OpenAI Embeddings
React

What I learned

Learnings, in order of how much they surprised me

The things I would tell another engineer before they tried the same experiment.

01
Visualising embeddings is one of the fastest ways to debug RAG quality. Most retrieval issues are visible
02
UMAP parameters change the story you see. Defaults are rarely right for your specific data
03
Browser-only clustering is feasible up to roughly 100k items. Past that, push the projection server-side
04
Read the LLM cost optimisation insight for how I think about embedding spend in production

Note: This is an experimental project in the alpha stage. It is a learning exercise and technical exploration rather than a production-ready solution. Patterns and code may change.

Want me to build something like this for you?

If this kind of work fits your roadmap, I take on a small number of paid projects each quarter.