AI made me faster. The design system is what made it reliable

Format Process case study · Subject How I work with AI, across four projects · Tools Claude Code, Playwright, axe-core, OpenAPI

Over the last year AI has become the way I work, not a tool I occasionally reach for. The projects below run the whole spectrum: Meiro's customer-data platform, shipped with a thirteen-person team; a 43-screen platform prototype built solo; a full-stack React product with a deployed database; and a dashboard that runs my house. Different stacks, different stakes – but one practice: write the rules down, make a machine enforce them, and never let an agent grade its own homework.

A contract, not a prompt

The usual way to work with a coding agent is conversational: ask, look at the result, correct, repeat. It doesn't scale past a handful of screens, because every correction lives and dies in one chat session. So instead of prompting taste, I write it down. The flagship project – a complete platform prototype with 43 screens across a public site, a business admin, and an operator back office, delivered by me alone in weeks – carries an AGENTS.md: a design contract every agent session loads before touching code, written as rules a machine can follow and a reviewer can check.

These aren't style suggestions. They're the same kind of decisions a design system encodes in tokens and components – spacing, typography, color, geometry – expressed as constraints on the code an agent is about to write.

AGENTS.md – excerpt, translated

Spacing sits on a 4px scale – {2, 4, 8, 12, 16, 20, 24, 28, 32, 40, 48…}. When tuning, round up. Hairline dividers stay 1px.
A number never separates from its unit or currency at a line break – non-breaking spaces before units and as thousands separators, white-space: nowrap on standalone figures as a safety net.
Headings get text-wrap: balance, body copy gets text-wrap: pretty.
A button whose only visible content is an icon is a perfect square. Never a rectangle.
Dynamic values – prices, dates, counters – render in tabular-nums. Palette work happens in OKLCH, not hex.

"Pixel-perfect" is a claim you have to earn

A contract is worthless without enforcement, and an agent will happily tell you a change is visually perfect without having looked at it. So the rules are backed by hard gates that run in CI on every change: content-hash cache-busted builds, CSS, JS and HTML linting, a Playwright health suite that fails on any broken asset or JavaScript error, smoke tests of the key interactions, and axe accessibility scans of every page in all three apps.

The strictest gate is visual: full-page screenshot diffs across three viewports and device-pixel ratios – 62 committed baselines, regenerated inside a Linux container so fonts render identically to CI. There is even a dedicated test that hairline dividers survive fractional DPRs on a 4K display at 200 % scaling, because that is exactly the kind of thing an agent breaks without noticing.

An agent may only claim a change is pixel-perfect if the screenshot diff actually ran. Quality is enforced by the pipeline, not by trust.

The outcome of that discipline is measurable: zero serious or critical accessibility violations across all 43 pages, and a visual baseline that catches regressions no human reviewer would spot at this scale.

QA gates – every change, every time

Build – content-hash cache-busting, export/local anti-drift check
Lint – stylelint, ESLint, html-validate
Health – 0 broken assets, 0 JS errors, all pages
Smoke – key interactions across the flows
Visual regression – 62 baselines, 3 viewports × DPR, Linux-rendered for CI font parity
Accessibility – axe scans, 0 serious/critical across 43 pages
Hairlines – dividers survive fractional DPR on 4K @ 200 %

At Meiro, the truth moved into code

AI engineering changed what designing at Meiro means for me in a very concrete way: the source of truth moved from Figma into the codebase, and my week now splits roughly 50/50 between the two. Instead of drawing screens and handing them over, I code the UI directly in the production repository of Meiro's customer-data platform – a product a thirteen-person team ships continuously. My commit history there isn't redlines and annotations; it's typography, spacing, navigation and error states, merged.

Motion went through the same shift. I used to prototype animation in After Effects and then describe easing curves to engineers, hoping the feel would survive translation. Now I animate the product itself – on a branch, against the real components, with the real data – and what gets merged is the animation. No video handoff, no approximation: the prototype and the shipped thing are the same artifact.

How does a designer get away with committing to production? Because the rules arent tribal knowledge – they're checks that run on every change. Lint catches a hardcoded color or spacing before anyone sees it, features start as written specs that agents tick off box by box, and whatever a session learns gets written back into AGENTS.md, so the next one – mine or a colleagues – starts smarter. My favorite piece: an agent that clicks through the live app as a persona, complains in first person about what annoys it, and backs it up with screenshots. It finds the kind of problems code review never will.

And when things went wrong, the team retro didn't blame whoever wrote the prompt. The written conclusion was the opposite: shared responsibility for AI-generated code, and more deterministic guardrails – because refactoring old code without strengthening the rules "will only recreate the same mess."

Release retrospective – excerpts

Keep the team norm of shared responsibility for AI-generated code and production bugs, instead of blaming whoever prompted it.
The codebase contains too many patterns for the same UI problems – which AI then keeps copying.
Start adding deterministic guardrails where possible: lint rules that enforce the intended patterns instead of relying only on prompts.

The prototype is the contract with engineering, too

A prototype at this fidelity changes what a handoff means. Instead of pictures and a wish, engineering gets a working spec: with AI I authored the OpenAPI 3.1 contract, the target data model mapped from the legacy database, and a threat model covering sixteen concrete risks. A Prism mock server runs the API from the spec, so the prototype is exercised against realistic responses before a single line of backend exists – the prototype doubles as acceptance criteria.

The same thinking applies to continuity. A progress log and structured handoff protocols mean any collaborator – the backend developer, a future teammate, or the next AI session – resumes with full context instead of archaeology.

The workflow is a design system, too

Recurring decisions get packaged the way a design system packages components. When a convention proves itself – the square icon-button rule, OKLCH color math with contrast checking, a set of UI-polish heuristics for optical alignment – it becomes a versioned agent skill, vendored in the repository. Ten of them now ship with the flagship project, and every future session applies them automatically instead of relearning them.

To prove the loop isn't tied to static prototypes, I ran the full stack solo on a separate product: React with TypeScript in strict mode, Radix UI, Tailwind with an OKLCH token palette, CI running tests against Postgres 16, and a deployed backend. Its AGENTS.md carries my favorite rule of the whole practice – the smoke gate needs a running API, and if you didn't start it, "don't pretend the gate passed." Honesty, written into the contract.

The curiosity doesn't switch off after work, either. The dashboard on my wall at home is AI-built too – a small Python server over the Home Assistant API, controlling the heat pump, ventilation, photovoltaics and CO₂ sensors. Not portfolio material on its own, but that's the point: when agents are cheap to direct well, the cost of scratching an itch drops to almost nothing.

The through-line

Codified rules, tokens, and automated verification are exactly what turn agentic tooling from a demo into a practice. AI didn't replace the design system – it's the reason the design system had to get sharper. Everything a good system already does for a team of humans, it now does for a team of one human and many agents.

Most of these products are client and team work I can't show publicly, which is fitting in a way – this case study is about the part that transfers: the process. If you'd like the numbers behind it or a walkthrough of the setup, write me.

Back to portfolio Next: Meiro case study