AI/MLexperimental

AI Testing Agent

An AI agent that writes and runs end-to-end tests by reading your app like a user would. It opens pages with Playwright, identifies elements visually and structurally, drafts test cases, and runs them. The honest result: it is great for happy paths and visual regression, mediocre for complex flows, and useless for anything that requires domain knowledge it does not have. Worth it as a first sweep, not as your test suite.

OpenAIPlaywrightTypeScriptComputer Vision

What this is

A lab, not a product.

Features

Learnings

Technologies

Capabilities

What it does

The features that actually got built and run in this prototype.

feature_01.ts

Automatic test generation by crawling routes and inferring user flows

feature_02.ts

Visual regression detection that compares screenshots across runs, with a learned tolerance

feature_03.ts

Self-healing selectors that fall back to vision when DOM selectors break

feature_04.ts

Natural language test descriptions, generated as readable prose alongside the code

feature_05.ts

CI/CD integration via the CI/CD playbook

The stack

What it is built with

The libraries and runtimes I picked for this lab and why they earned their place.

OpenAI

Playwright

TypeScript

Computer Vision

What I learned

Learnings, in order of how much they surprised me

The things I would tell another engineer before they tried the same experiment.

Vision models help understand page structure for test generation, especially for design-heavy pages

Self-healing selectors reduce test maintenance dramatically but introduce non-determinism. Pick your poison

Generated tests work best for happy paths. Edge cases still need human input and domain knowledge

See the AI agent orchestration blueprint for how to wire this into a wider agent setup

Note: This is an experimental project in the experimental stage. It is a learning exercise and technical exploration rather than a production-ready solution. Patterns and code may change.

AI/ML

Related labs

Other explorations in this area.

beta

AI Code Review Assistant

A small prototype that turns a GitHub pull request into a contextual code review. I wanted to see how far you can push an LLM with a tight feedback loop, structured diffs, and a real linter running alongside. The result is a tool that catches the boring stuff (naming, dead branches, missing null checks) and surfaces the interesting stuff (design choices, hidden coupling) before a human even opens the PR. It is not a replacement for review, it is a sharper first pass.

alpha

AI Form Generator

You describe a form in plain English, the tool generates a fully validated React component with proper field types, accessibility, and a Zod schema for validation. I wanted to test how reliable structured LLM output has become in 2025-2026 for a tightly scoped generation task. The answer is: very reliable, as long as you constrain the output schema aggressively and validate the model's response before rendering it.

experimental

Voice-First Interface

An experiment in voice-driven web UIs with real-time transcription and natural language commands. The trigger was watching how often I reach for keyboard shortcuts in tools I use daily, and wondering whether voice could be a faster path for some of those interactions. The answer is: sometimes, on the right device, in the right room. This prototype combines Whisper transcription with an intent classifier and a small command router. It is interesting, not a product.

Want me to build something like this for you?

If this kind of work fits your roadmap, I take on a small number of paid projects each quarter.

Start a project Just say hello

Local-First Sync Engine

Edge Rate Limiter