{"id":598,"date":"2026-07-01T11:52:46","date_gmt":"2026-07-01T11:52:46","guid":{"rendered":"https:\/\/blog.aminalam.info\/?p=598"},"modified":"2026-07-01T14:29:45","modified_gmt":"2026-07-01T14:29:45","slug":"i-trained-a-headset-fit-detector-on-zero-real-photos-and-it-works","status":"publish","type":"post","link":"https:\/\/blog.aminalam.info\/?p=598","title":{"rendered":"I trained a headset fit-detector on zero real photos &#8230; and it works on real photos."},"content":{"rendered":"\n<p class=\"wp-block-paragraph\"><em>How synthetic data let us skip the dataset entirely, and still ship 93% real-world accuracy on-device.<\/em><\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img data-recalc-dims=\"1\" fetchpriority=\"high\" decoding=\"async\" width=\"474\" height=\"379\" src=\"https:\/\/i0.wp.com\/blog.aminalam.info\/wp-content\/uploads\/2026\/07\/qa_test.png?resize=474%2C379&#038;ssl=1\" alt=\"\" class=\"wp-image-605\" srcset=\"https:\/\/i0.wp.com\/blog.aminalam.info\/wp-content\/uploads\/2026\/07\/qa_test.png?w=1280&amp;ssl=1 1280w, https:\/\/i0.wp.com\/blog.aminalam.info\/wp-content\/uploads\/2026\/07\/qa_test.png?resize=300%2C240&amp;ssl=1 300w, https:\/\/i0.wp.com\/blog.aminalam.info\/wp-content\/uploads\/2026\/07\/qa_test.png?resize=768%2C614&amp;ssl=1 768w, https:\/\/i0.wp.com\/blog.aminalam.info\/wp-content\/uploads\/2026\/07\/qa_test.png?w=948&amp;ssl=1 948w\" sizes=\"(max-width: 474px) 100vw, 474px\" \/><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">The problem nobody warns you about<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Haven is a flickering light therapy headset. For the therapy to mean anything, the device has to sit correctly on the patient&#8217;s face \u2014 level, centered, over the eyes. So before a session starts, the app looks through the phone&#8217;s camera and answers one deceptively simple question: <em>is this being worn correctly, and if not, what&#8217;s wrong?<\/em> Tilted left? Slipped down the nose? Too high? Off to one side? Not on at all?<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">This is a textbook keypoint-detection problem. The textbook solution is where it gets expensive.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Two expensive doors<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">To train a model to recognize good vs. bad fit, you normally need examples \u2014 lots of them. That meant one of two things:<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Door A \u2014 collect a real dataset.<\/strong> Recruit a diverse set of people. Get them to wear the device, in different rooms, under different lighting, at every wrong angle we care about. Photograph it. Then label thousands of frames by hand. For a medical device, add consent, privacy handling, and the sheer calendar time of coordinating humans. Weeks of work and real money before a single model trains.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Door B \u2014 ask an LLM at runtime.<\/strong> Skip training entirely: send each camera frame to a vision model and let it describe the fit. Tempting, until you count the costs \u2014 a recurring per-image bill that never stops, a hard dependency on network connectivity (our app is offline-first), latency far too high for a live camera preview, and \u2014 the dealbreaker \u2014 streaming a patient&#8217;s face to a third-party server. For a HIPAA-bound medical product, that last one ends the conversation.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Neither door was good. So we built a third.<\/p>\n\n\n\n<!--more-->\n\n\n\n<h2 class=\"wp-block-heading\">Door C \u2014 render the world instead of photographing it<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">I generated the <strong>entire<\/strong> training set synthetically. I already had the headset&#8217;s exact 3D geometry (it&#8217;s the thing we manufacture), so we did the one thing you can&#8217;t do with real photos: I placed the goggles on virtual faces <em>ourselves<\/em>, at precisely the poses we wanted to detect \u2014 and got perfect labels for free, because we defined them.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The pipeline, end to end:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>~200 head meshes<\/strong> from MakeHuman, each with dozens of randomized facial morphs, skin tones, hairstyles, eyebrows, and eye colors \u2192 <strong>640 unique faces<\/strong> across the training split.<\/li>\n\n\n\n<li>The <strong>HAVEN headset STL<\/strong>, placed on each face at the fit variants we care about: <code>correct<\/code>, <code>tilted left\/right<\/code>, <code>slipped down<\/code>, <code>too high<\/code>, <code>off-center left\/right<\/code>, plus negatives (<code>not worn<\/code>, <code>no goggles<\/code>).<\/li>\n\n\n\n<li><strong>46 real HDRI lighting environments<\/strong> \u2014 biased toward the places therapy actually happens (hospital rooms, living rooms, bedrooms) but salted with streets, gardens, and studios so the model never leans on the background.<\/li>\n\n\n\n<li>Rendered in <strong>Blender Cycles<\/strong> at selfie-style focal lengths and camera distances, with lighting brightness jitter and slight head pose variation.<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">The result is ~2,000 photorealistic-enough training images, every one labeled to the pixel, produced for the cost of GPU time.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">The model<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">The detector is deliberately tiny: a <strong>MobileNetV3-Small backbone with a 6-keypoint heatmap head \u2014 about 2.1M parameters.<\/strong> It predicts six points on the front face of the headset frame. Those keypoints feed a small, <strong>purely geometric fit-checker<\/strong> \u2014 no second neural net, just deterministic rules on angles and positions (roll \u2264 5\u00b0, eye centered in the frame opening, left\/right symmetry within tolerance). That means the &#8220;why&#8221; behind every verdict is inspectable and tunable, which matters a lot in a medical context.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">On the synthetic validation set the keypoints land within ~2 pixels (PCK@0.05 \u2248 1.0). The model exports to an <strong>8.6 MB TFLite file that runs at ~10 FPS on-device<\/strong> \u2014 no server, fully offline, nothing about the patient ever leaves the phone.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img data-recalc-dims=\"1\" decoding=\"async\" width=\"474\" height=\"264\" src=\"https:\/\/i0.wp.com\/blog.aminalam.info\/wp-content\/uploads\/2026\/07\/Screencast-from-2026-07-01-13-52-49-2.gif?resize=474%2C264&#038;ssl=1\" alt=\"\" class=\"wp-image-608\"\/><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">The part that was supposed to fail<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Everyone who&#8217;s done this will tell you the same thing: models trained on synthetic data fall apart on real images. The &#8220;domain gap&#8221; is real. My defense was aggressive <strong>domain randomization<\/strong> \u2014 never let the model rely on any single cue. Vary the faces, the skin, the hair, the light, the room, the camera, the angle, until &#8220;synthetic&#8221; stops being a feature the model can learn.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Then I tested it on the only thing that counts: <strong>real photos of real people wearing the real device.<\/strong> It agreed with human fit judgment more than <strong>90% of the time<\/strong>, and it&#8217;s now running in production.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">It isn&#8217;t magic. Harsh backlighting and a few unusual hairstyles still trip it, and closing that last gap is ongoing work. But 93% real-world accuracy from a model that never saw a real photo \u2014 for a fraction of the cost and zero privacy exposure \u2014 is a trade I&#8217;ll make every time.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">What I&#8217;d tell my past self<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>You often don&#8217;t need real data \u2014 you need real <em>variation<\/em>.<\/strong> Synthetic data gives you total control over that variation, and perfect labels as a side effect.<\/li>\n\n\n\n<li><strong>The cheapest path and the most private path can be the same path.<\/strong> No collection campaign, no per-frame API bill, no patient images on someone else&#8217;s server.<\/li>\n\n\n\n<li><strong>Keep the learned part small and the decision part legible.<\/strong> A tiny keypoint model plus transparent geometric rules beats a black box you can&#8217;t explain \u2014 especially in medical software.<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\"><em>Built with Blender, MakeHuman, PyTorch Lightning, and TFLite. The headset is Haven, by Syntropic Medical.<\/em><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n","protected":false},"excerpt":{"rendered":"<p>How synthetic data let us skip the dataset entirely, and still ship 93% real-world accuracy on-device. The problem nobody warns you about Haven is a flickering light therapy headset. For the therapy to mean anything, the device has to sit correctly on the patient&#8217;s face \u2014 level, centered, over the eyes. So before a session &hellip; <a href=\"https:\/\/blog.aminalam.info\/?p=598\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">I trained a headset fit-detector on zero real photos &#8230; and it works on real photos.<\/span> <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"advanced_seo_description":"","jetpack_seo_html_title":"","jetpack_seo_noindex":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[12,14,10],"tags":[],"class_list":["post-598","post","type-post","status-publish","format-standard","hentry","category-blog","category-engineering","category-science"],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/blog.aminalam.info\/index.php?rest_route=\/wp\/v2\/posts\/598","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.aminalam.info\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.aminalam.info\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.aminalam.info\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.aminalam.info\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=598"}],"version-history":[{"count":8,"href":"https:\/\/blog.aminalam.info\/index.php?rest_route=\/wp\/v2\/posts\/598\/revisions"}],"predecessor-version":[{"id":613,"href":"https:\/\/blog.aminalam.info\/index.php?rest_route=\/wp\/v2\/posts\/598\/revisions\/613"}],"wp:attachment":[{"href":"https:\/\/blog.aminalam.info\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=598"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.aminalam.info\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=598"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.aminalam.info\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=598"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}