Designing the voice and intelligence of Flo's AI health guide

Context Flo Health is the #1 OB-GYN-recommended app for period and cycle tracking, supporting 75 million monthly active users. The goal was to build a conversational AI platform that could deliver hyper-personalized health insights — helping people who menstruate understand their bodies using their own logged data.

As Senior Content Designer on the AI Foundations team, I worked at the intersection of content, medical safety, and machine learning to shape how the model thinks, responds, and sounds.

Challenge Our LLM had been built with safety as the primary focus — which was essential, but had created a model that was often more cautious than helpful. Specifically: rejection rates ranged from 2–30% depending on topic, 43% of responses contained prominent medical disclaimers sometimes without useful accompanying information, and 11% of responses offered no actionable next steps. Users were left without guidance on questions about their own health.

The central tension: how do we make the model genuinely useful without compromising the 90.95% medical accuracy we'd already achieved?

Case study 1: Building a content usefulness framework

Recognizing that we had solid metrics for safety but no way to measure usefulness, I collaborated with an ML engineer to design a comprehensive evaluation framework — the Content Usefulness Metric.

While the ML engineer conceived the core metric categories, my key contribution was articulating each criterion with enough specificity and distinctness to prevent overlap and enable precise automated judging. I also created the "good" and "bad" examples needed to calibrate the future AI judges.

The five metrics I defined: Rejection, Misfire, Completeness, Relevance, and Delivery — each with a precise definition and criteria that could be applied consistently across 200 manually labeled test cases, and eventually automated.

Before building the automated system, I also drove the content improvements themselves — fine-tuning examples, rewriting medical scenarios in a more useful tone, and working closely with the Senior Medical Advisor to ensure every change remained medically and legally compliant.

Impact The usefulness metric became the foundation for Flo's company-wide AI Judges Platform — a single source of truth for all AI evaluation across legal, medical, and tone of voice scoring. Work that started as a prompt-level experiment became infrastructure that scales across the entire organization.

Case study 2: Defining the "doctor friend" tone of voice

Flo's brand had an established TOV — "doctor friend" — but that concept was too vague to translate directly into model instructions. My ML engineer's approach was to define the extremes: not too friendly, not too clinical. I wanted something more concrete and scalable.

My approach: identify the gap between the tone we had and the tone we wanted, then design toward it systematically.

I analyzed existing model responses to map specific problems — responses that prioritized disclaimers over the user's actual concern, that lacked empathy, that ended conversations rather than continuing them. Then I used AI (Gemini Flash 2.0) to generate responses in the voices of celebrity pillars that represented Flo's TOV attributes — Tuned-In, Spirited, Expert, Candid — and evaluated the outputs against our desired tone.

The voice that best captured our ideal was Dr. Jen Gunter — an OBGYN and author who embodied the Expert pillar: candid, knowledgeable, and genuinely engaged with the user's question.

I used this to rewrite the product TOV rules, focusing on two dimensions: general tone (how the model sounds) and conversational flow (how it engages). These were then injected into the system prompt and examples, tested on 50 side-by-side comparisons.

Impact A concrete, testable, scalable tone definition that can be integrated into system prompts across other Flo AI teams — preventing each team from reinventing the brand voice and ensuring consistency at scale.

Role Senior Content Designer | Generative AI | Flo Health Collaborators ML Engineer, Senior Medical Advisor, Senior Product Designer

Voice and evaluation, GenAI

Designing the voice and intelligence of Flo's AI health guide

As Senior Content Designer on the AI Foundations team, I worked at the intersection of content, medical safety, and machine learning to shape how the model thinks, responds, and sounds.

The central tension: how do we make the model genuinely useful without compromising the 90.95% medical accuracy we'd already achieved?

Case study 1: Building a content usefulness framework

Case study 2: Defining the "doctor friend" tone of voice