Data Visualizationalpha

Embedding Explorer

An interactive 3D visualiser for vector embeddings. Paste a corpus, pick an embedding model, watch UMAP project it into 3D, and click clusters to inspect what is inside. I built it as a debugging tool for RAG systems, because the fastest way to understand why a retrieval is bad is to look at the embeddings themselves. It turns into a teaching tool too: showing someone the cluster of 'shipping' tickets sitting next to 'returns' explains a vector database faster than a slide deck.

Three.jsUMAPOpenAI EmbeddingsReact

What this is

A lab, not a product.

Features

Learnings

Technologies

Capabilities

What it does

The features that actually got built and run in this prototype.

feature_01.ts

Upload or paste a corpus directly in the browser, with chunking presets

feature_02.ts

Choose embedding provider (OpenAI, Cohere, or a local model via Ollama)

feature_03.ts

UMAP projection in the browser using web workers, so the UI never freezes

feature_04.ts

Hover and click clusters to inspect raw content and similarity scores

feature_05.ts

Export labelled clusters as JSON, pair with the RAG blueprint

The stack

What it is built with

The libraries and runtimes I picked for this lab and why they earned their place.

Three.js

UMAP

OpenAI Embeddings

React

What I learned

Learnings, in order of how much they surprised me

The things I would tell another engineer before they tried the same experiment.

Visualising embeddings is one of the fastest ways to debug RAG quality. Most retrieval issues are visible

UMAP parameters change the story you see. Defaults are rarely right for your specific data

Browser-only clustering is feasible up to roughly 100k items. Past that, push the projection server-side

Read the LLM cost optimisation insight for how I think about embedding spend in production

Note: This is an experimental project in the alpha stage. It is a learning exercise and technical exploration rather than a production-ready solution. Patterns and code may change.

Data Visualization

Related labs

Other explorations in this area.

beta

Streaming Analytics Dashboard

A real-time analytics dashboard pulling events through Kafka, aggregating in ClickHouse, and pushing updates over WebSockets at sub-second refresh rates. I built it to learn how the columnar-store world has changed: ClickHouse is dramatically faster than the OLAP setups I was running five years ago, and it makes a single-node setup feel like a small data warehouse. The dashboard itself is intentionally simple, the interesting part is the pipeline behind it.

Want me to build something like this for you?

If this kind of work fits your roadmap, I take on a small number of paid projects each quarter.

Start a project Just say hello

Edge Rate Limiter