Building SuomiSanat With AI Agents

Apr 6, 2026 | #ai,#react,#webdev,#productivity

For a while, I wanted to see how far AI coding agents could take a real small product instead of a one-screen demo.

SuomiSanat was a good test case. It is a Finnish YKI level 3 vocabulary trainer built with React, TypeScript, Tailwind, a versioned JSON word dataset, offline-first progress tracking, optional Supabase sync, and Playwright coverage. The public repo only ships a small self-authored sample dataset, but the app structure is the same one I would use for a larger private list.

The source repo is public at github.com/supillai/suomisanat-app.

I used Codex as the main coding agent, but the lessons here are mostly about the workflow, not one specific tool.

On this branch, the project grew from the first commit on March 1, 2026 through a month of follow-up changes by April 6, 2026. The history is useful because it shows the actual build pattern. It was not one giant prompt. It was a sequence of small instructions, follow-up fixes, refactors, and verification passes.

That is the main reason I think this repo is worth writing about. It looks much closer to normal software work than to an AI demo.

Why This Project Was a Good Test

I did not want to test agents on something that only needed static markup or one happy-path API call. I wanted enough moving parts to expose where the approach breaks down.

SuomiSanat has a few characteristics that make it a better test than a toy project:

the app has real client-side state
the vocabulary data needs validation
the product has to work well on mobile screens
the app still needs to function offline
cloud sync is optional, but if enabled it must behave predictably

That combination puts pressure on architecture, state handling, testing, and release discipline.

How The Repo Actually Evolved

The repo history settled into a pattern that I would reuse:

Build a narrow working slice.
Add the next useful feature.
Watch where the code starts to bend.
Refactor before the shape gets too wrong.
Convert fragile rules into code, tests, and scripts.

The dates make the progression pretty clear:

March 1, 2026: initial vocabulary trainer and early study flow refinements
March 3, 2026: Supabase auth and first cloud sync pass
March 7, 2026: merge behavior, debounced writes, and first conflict handling
March 14, 2026: major refactor out of a swollen App.tsx
March 15-17, 2026: data validation, offline/PWA work, and repeated mobile fixes
March 22-23, 2026: sync reliability fixes and RPC-based delta batching
March 24-27, 2026: dataset versioning from words.v2.json through words.v4.json
March 28-April 6, 2026: public-release hardening, privacy placeholder cleanup, and smaller test and docs fixes

That is what AI-assisted development looked like here. Fast progress, but not frictionless progress.

What Worked Well

1. Starting with a working vertical slice

The first commit shipped an app idea, not an architecture diagram. That matters with agents.

Once the app existed, it became easy to ask for concrete changes:

improve the study flow
restructure the word list
add sync
fix mobile layouts
validate the dataset

Agents seem to do much better when the target is concrete and already visible. In practice, “make this screen do one more real thing” worked better than “design the right system up front.”

2. Refactoring when the pressure became obvious

src/App.tsx is probably the clearest signal in the repo.

It started at 407 lines in the first commit. Before the March 14, 2026 refactor, it had grown to 1,904 lines. The current file is 156 lines.

That arc taught me something simple. AI agents are very good at continuing in the direction the code already points. If a file is the easiest place to put the next feature, they will keep putting the next feature there until you stop them.

The fix was not a smarter prompt. The fix was a structural change: feature folders, hooks, smaller screen components, and utilities.

3. Turning rules into code

One of the better changes was the dataset workflow.

The dataset flow now includes:

a versioned asset in public/data/words.v4.json
a parser in src/data/word-data.ts
a guard test in src/data/word-data.test.ts
a merge script in scripts/build-word-dataset.mjs

That script preserves IDs from the base dataset, renumbers later additions sequentially, normalizes Finnish text for duplicate detection, and writes deterministic output. That is much better than keeping the rules in prompt text or in my head.

This is one of the main lessons I would generalize: if a rule matters, make it executable.

4. Keeping cloud sync optional

Cloud sync landed early, but the app still works in local-only mode when Supabase is not configured.

AI-generated code tends to take the shortest route to a satisfying demo, and that often means turning optional infrastructure into a hard dependency. For a real product, I think the better approach is to preserve a strong local-only mode and let the integration layer stay progressive.

In this repo, that kept the core product useful even while sync details were still moving.

5. Building a verification loop

The app eventually gained:

linting
typechecking
unit tests
Playwright coverage
a npm run verify gate

That sounds ordinary, but it is the part that makes agent speed safe enough to use. If you remove that loop, you can move faster for a while, but you mostly accumulate bugs faster too.

What Took More Effort Than Expected

1. Sync was not a one-and-done feature

The current branch has 19 commit messages that mention sync. That alone tells the story.

After the first sync implementation landed on March 3, 2026, there were still follow-up changes for:

local and cloud merge behavior
debounced writes
page-hide flushes
conflict semantics
stale conflict cleanup
dropped write prevention
keepalive fallback
hydration timing
session refresh edge cases
RPC-based delta batches

This is not really an AI-specific problem. Distributed state is just hard. Agents can help build it, but they do not remove race conditions or make conflict handling obvious.

2. Mobile needed repeated passes

The branch also has 15 commit messages that mention mobile or iPhone.

Again, that is a useful reality check. A layout that looks fine on desktop can still fail on short viewports, mobile browsers, or touch-heavy interactions. In this project I ended up revisiting panel density, touch targets, scroll behavior, card interaction, bottom-nav visibility, and short-screen layout more than once.

If mobile matters, it needs to be part of the loop early. It is not a finishing pass.

3. Working code can still hide a design hotspot

The current useCloudSync hook is well over 500 lines long.

It works, but it is also where a lot of complexity pooled. That is another pattern I now watch more closely in AI-assisted work. A feature can be functionally correct and still be asking for another refactor.

The lesson is not “large files are bad.” The lesson is that once complexity starts collecting in one place, the next few features usually become more expensive.

Practices I Would Reuse

If I were starting another app with AI agents, I would keep the operating model simple:

start with a working slice, not a speculative architecture
ask for one meaningful change at a time with a clear success condition
treat repeated fixes as design feedback, not bad luck
move important rules into scripts, validators, tests, and assertions
keep external services optional where the product can survive without them
refactor when the shape gets wrong, not only when it becomes painful
add repo-level documentation so future agent sessions inherit context instead of guessing

One small but useful addition in this repo was writing down the workflow in Agents.md and docs/CODEBASE_NOTES.md. That reduced the amount of repeated explaining needed in later sessions.

Final Thoughts

My main takeaway is not that AI agents can build an app on their own. It is that they reduce the cost of iteration.

That is a real advantage, but it only pays off if someone still owns the engineering decisions:

architecture
invariants
edge cases
mobile constraints
release quality

SuomiSanat became much better once the important rules stopped living in prompts and started living in the codebase itself. That is the part I trust now.

AI agents are useful accelerators. They still need direction, constraints, and verification.

Back to all posts