Engineering | Amin's Blog

How synthetic data let us skip the dataset entirely, and still ship 93% real-world accuracy on-device.

The problem nobody warns you about

Haven is a flickering light therapy headset. For the therapy to mean anything, the device has to sit correctly on the patient’s face — level, centered, over the eyes. So before a session starts, the app looks through the phone’s camera and answers one deceptively simple question: is this being worn correctly, and if not, what’s wrong? Tilted left? Slipped down the nose? Too high? Off to one side? Not on at all?

This is a textbook keypoint-detection problem. The textbook solution is where it gets expensive.

Two expensive doors

To train a model to recognize good vs. bad fit, you normally need examples — lots of them. That meant one of two things:

Door A — collect a real dataset. Recruit a diverse set of people. Get them to wear the device, in different rooms, under different lighting, at every wrong angle we care about. Photograph it. Then label thousands of frames by hand. For a medical device, add consent, privacy handling, and the sheer calendar time of coordinating humans. Weeks of work and real money before a single model trains.

Door B — ask an LLM at runtime. Skip training entirely: send each camera frame to a vision model and let it describe the fit. Tempting, until you count the costs — a recurring per-image bill that never stops, a hard dependency on network connectivity (our app is offline-first), latency far too high for a live camera preview, and — the dealbreaker — streaming a patient’s face to a third-party server. For a HIPAA-bound medical product, that last one ends the conversation.

Neither door was good. So we built a third.

Continue reading →

Amin's Blog

Category Archives: Engineering

I trained a headset fit-detector on zero real photos … and it works on real photos.

The problem nobody warns you about

Two expensive doors