AI Integrationintermediate

Building an AI Chatbot with Streaming

Most chatbot demos look great until a real user asks something unexpected, the network blips, or the bill arrives. This playbook is the production version: streaming responses, persistent conversations, sensible context windows, retries, rate limits, and the small UX details that make a chat interface feel native instead of bolted on. Everything here has been used in shipped products.

90 min7 steps

Steps

Tools

Outcomes

intermediate

Difficulty

Technologies used

OpenAIVercel AI SDKNext.jsReact

The methodology

The phases, in order

Each phase below is something I actually run in a project. The descriptions are how I think about the work, not abstract definitions.

Phase

Phase 1 of 7

Provider Setup and Cost Estimation

I install the Vercel AI SDK and configure it with either OpenAI or Anthropic. Before writing prompts I do a back-of-envelope cost model: average tokens per turn, expected conversations per day, target margin. That single spreadsheet has saved more projects than any architectural decision. See this comparison for picking a provider.

Phase

Phase 2 of 7

Streaming API Route

The chat endpoint is a Next.js route handler that returns a streaming response. I keep the system prompt in a separate file, never inline, and version it like code. The route validates the message history schema with Zod, applies a token budget, and emits a server-sent event stream that the client can render token by token. Pairs with my AI integration service.

Phase

Phase 3 of 7

Chat UI with Streaming

The chat component renders messages in a virtualized list with a sticky composer at the bottom. Streaming tokens append in place, and I add a typing indicator that disappears on the first token to avoid double-state. Tailwind plus shadcn/ui handle the styling. The composer auto-grows, submits on Enter, and respects Shift+Enter for new lines.

Phase

Phase 4 of 7

Context and Memory Management

I keep the system prompt fixed, then build the message array up to a target token count, dropping the oldest user-and-assistant pairs first. For longer conversations I summarize the dropped section into a compact memory block. Without this step, costs and latency creep up linearly with conversation length and nobody notices until the bill arrives.

Phase

Phase 5 of 7

Error Handling and Retries

AI providers fail more often than people expect: rate limits, timeouts, partial streams, content filter rejections. I wrap calls in a typed retry helper with exponential backoff, and surface a clean error message to the user rather than a stack trace. Failed streams resume from the last completed assistant message, never mid-token.

Phase

Phase 6 of 7

Rate Limiting and Abuse Controls

Every chatbot needs rate limits before it goes public. I add per-user and per-IP limits via Redis, plus a token-cost budget per day so a single user cannot drain the API key. The patterns come straight from the API security playbook. I also add input length caps and a hard refusal for obvious abuse patterns.

Phase

Phase 7 of 7

Persistence and Conversation History

I store conversations in Postgres with a clean schema: user_id, conversation_id, role, content, created_at, model, tokens_in, tokens_out, cost_cents. That last set of columns is what lets you actually understand unit economics later. The UI loads recent conversations into a sidebar, and a search hits a Postgres full-text index.

Results

What You'll Achieve

Expected outcomes from implementing this playbook

Streaming AI responses that feel native to the product

Conversation persistence with cost and token tracking

Context-aware interactions that scale beyond a single turn

Production error handling, retries, and abuse controls

See production examples in my portfolio or contact me.

Use this playbook

Want me to run this with you?

The playbook is the public version. The private version is me running it for your team against a real deadline. If you have a project on the line, that is usually the faster path.

Start a project Just ask a question

Related insights

More on this thinking

The studio journal

Essays and notes that pair with the playbooks.

Insights in AI Integration

Filter the journal for pieces on this topic.

Related blueprints

Reference architectures

All blueprints

Production-grade reference systems I have shipped.

Labs

Experiments where I prototype the playbooks in public.

Next up

DevOps · 45 min

CI/CD Pipeline with GitHub Actions

A CI pipeline is either a quiet asset or a noisy tax, and the difference is whether you took it seriously the first week or bolted it on after the team grew. This is the pipeline I set up on every new project: tests, lint, type-check, build, preview deploys, and production releases, all running in under five minutes and giving useful feedback when they fail.

AI Integration

Related Playbooks

Other playbooks in this category

advanced

Building RAG Applications

Retrieval-augmented generation looks simple in a demo and stays simple until your knowledge base is bigger than a thousand documents, chunks overlap badly, or relevance scores stop making sense. This is my end-to-end RAG playbook: document processing, embedding pipelines, retrieval tuning, prompt design, and the evaluation harness that tells you whether changes are actually improving results.

intermediate

Shipping AI Features Without the Hype Tax

Most AI features ship as a demo that survives one round of investor questions and then quietly dies in production. This is the discipline that gets AI features past that wall: small scope, real evals, careful rollouts, and instrumentation that catches drift early. The same loop I run when I add AI capabilities to an existing product, on a real timeline with real users.

Building Real-time Features with Supabase

CI/CD Pipeline with GitHub Actions