Edge Computingexperimental

Edge ML Inference

Running machine learning models at the edge for ultra-low latency predictions without cold starts.

View Source

Technology Stack

ONNX RuntimeVercel EdgeTensorFlow.jsWebAssembly

Capabilities

Features Explored

Key capabilities implemented in this experiment

feature_01.ts

Sub-10ms inference latency

feature_02.ts

No cold start delays

feature_03.ts

Quantized model support

feature_04.ts

Automatic model caching with the edge caching playbook

feature_05.ts

Fallback to cloud for complex models

Insights

Key Learnings

What I discovered while building this

WASM-based inference adds ~5ms overhead but enables complex models

Model size significantly impacts edge function cold starts

Quantization can reduce model size 4x with minimal accuracy loss

See related edge ML insight.

Note: This is an experimental project in the experimental stage. It represents a learning exercise and technical exploration rather than a production-ready solution. Code and patterns may change significantly.

Edge Computing

Related Experiments

Other explorations in this area

beta

Edge Rate Limiter

A globally distributed rate limiter running at the edge with token-bucket and sliding-window algorithms backed by Cloudflare Durable Objects.

Interested in this technology?

I'm always happy to discuss experiments and share learnings. Let's connect if you're exploring similar ideas.

Get in Touch

Real-time Collaborative Editor

Visual Query Builder