AI/MLbeta

AI Code Review Assistant

A small prototype that turns a GitHub pull request into a contextual code review. I wanted to see how far you can push an LLM with a tight feedback loop, structured diffs, and a real linter running alongside. The result is a tool that catches the boring stuff (naming, dead branches, missing null checks) and surfaces the interesting stuff (design choices, hidden coupling) before a human even opens the PR. It is not a replacement for review, it is a sharper first pass.

What this is

A lab, not a product.

A small prototype that turns a GitHub pull request into a contextual code review. I wanted to see how far you can push an LLM with a tight feedback loop, structured diffs, and a real linter running alongside. The result is a tool that catches the boring stuff (naming, dead branches, missing null checks) and surfaces the interesting stuff (design choices, hidden coupling) before a human even opens the PR. It is not a replacement for review, it is a sharper first pass.

5

Features

4

Learnings

4

Technologies

Capabilities

What it does

The features that actually got built and run in this prototype.

feature_01.ts
Diff-aware analysis that only reviews the lines that changed, with surrounding context windowed by AST
feature_02.ts
Security pattern detection that pairs with the API security playbook
feature_03.ts
Inline suggestions that map back to specific lines using GitHub PR review APIs
feature_04.ts
Style and best-practice nudges scoped to the project's own eslint config, not generic advice
feature_05.ts
Optional self-host mode so the model never sees code that should not leave your network

The stack

What it is built with

The libraries and runtimes I picked for this lab and why they earned their place.

What I learned

Learnings, in order of how much they surprised me

The things I would tell another engineer before they tried the same experiment.

01
Prompt engineering for code understanding is mostly context management, not clever prompts. See the shipping AI features playbook
02
Chunking strategies matter more than model size for large files. A 1k-line file split poorly gives worse reviews than a 200-line file with the right neighbours
03
Combining static analysis (the boring linter) with LLM judgement (the interesting questions) is much better than either on its own
04
Hallucinated bugs are real and expensive. The model needs a structured way to say 'I am not sure', otherwise reviewers learn to ignore it

Note: This is an experimental project in the beta stage. It is a learning exercise and technical exploration rather than a production-ready solution. Patterns and code may change.

Want me to build something like this for you?

If this kind of work fits your roadmap, I take on a small number of paid projects each quarter.