Synthetic Patient Data Beyond Synthea
Synthea generates population-level data. Healthcare AI agents need scenario-specific patients with clinical coherence. Here's what's missing and how to fill the gap.
TL;DR
- Synthea generates population-level synthetic data well. Healthcare AI agents need scenario-specific patients with clinical coherence, vendor-shaped data, and full resource graphs.
- The Synthea paper (Walonoski et al., JAMIA 2018) remains the most cited synthetic health data reference, and ONC adopted Synthea for patient-centered outcomes research (ONC technical guide).
- Over 90% of US hospitals run FHIR-capable EHRs (ONC), each with vendor-specific data shapes Synthea does not reproduce.
- 80% of healthcare AI projects never scale past pilot phase (Health Tech Digital, 2024). Unrealistic test data is a recurring root cause.
"We have to get this right. We have to solve digital health."
Grahame Grieve, FHIR Product Director, HL7
What Synthea does well
Synthea is the standard tool for generating synthetic patient data in healthcare. It simulates patient life histories from birth to death, producing FHIR R4 bundles with encounters, conditions, medications, observations, immunizations, and procedures that follow realistic clinical pathways.
For population health analytics and basic integration testing, Synthea is excellent:
- Statistically realistic demographics. Synthea uses census data to generate patients with realistic age, gender, race, and geographic distributions. If you need 10,000 patients that look like the US population, Synthea delivers.
- Longitudinal clinical histories. Each patient has a timeline of events: childhood immunizations, a diabetes diagnosis at 45, metformin started at 46, an A1c observation every 6 months. The events follow clinical modules that model disease progression.
- Valid FHIR bundles. The output conforms to FHIR R4 and US Core profiles. Resources have correct references, coded values use standard terminologies (SNOMED, LOINC, RxNorm), and the bundles can be loaded into any FHIR server.
- Configurable population parameters. You can generate patients for a specific state, age range, or condition prevalence. You can seed the random generator for reproducibility.
If you are testing a FHIR integration that reads patient data, Synthea is a reasonable starting point. But if you are testing an AI agent that needs to act on patient data, as outlined in our healthcare AI testing framework, Synthea's limitations become apparent quickly.
Where Synthea falls short for AI agents
Scenario specificity
AI agents do not process random patients. They handle specific scenarios: "patient needs prior auth for an MRI due to a herniated disc," or "patient is being discharged and needs medication reconciliation," or "patient has a denied claim that needs to be appealed."
Synthea generates patients probabilistically. You might generate 1,000 patients and find 3 that happen to have a herniated disc. But those 3 patients will not have the specific combination of conditions, imaging history, failed conservative treatment, and insurance configuration that makes a prior auth scenario realistic.
You cannot tell Synthea: "Generate a patient who had back pain for 6 months, tried physical therapy, got an MRI that showed a herniated disc at L4-L5, was referred to a neurosurgeon, and needs prior auth for a lumbar fusion under a UnitedHealthcare PPO plan." Synthea does not work that way.
Clinical coherence for edge cases
Synthea's clinical modules model common disease pathways well. But the cases that break AI agents are the edge cases: the patient with three comorbidities that create a drug interaction, the patient whose insurance changed mid-treatment, the patient with a rare condition that does not fit standard clinical pathways.
These edge cases require clinical coherence. The conditions, medications, labs, and encounters need to tell a story that makes medical sense. Randomly combining conditions and medications produces data that is syntactically valid but clinically nonsensical. An AI agent trained or tested on nonsensical data will develop nonsensical behaviors.
Vendor-specific data shapes
Epic, Cerner (Oracle Health), Athena, and other EHR vendors return FHIR data differently. Epic wraps practitioner references in contained resources. Cerner uses extensions for clinical status. Athena represents medications differently than Epic.
Synthea generates generic FHIR data. It does not produce the vendor-specific quirks that cause agents to fail in production. Your agent might handle Synthea data perfectly and choke on the first Epic bundle it encounters because Epic's Condition resources include extensions your parser has never seen.
Insurance and coverage data
Synthea generates basic coverage resources, but they are sparse. Real prior auth workflows require specific plan identifiers, group numbers, subscriber IDs, and coverage details. The payer's identity determines which IVR tree to navigate, which portal to use, and which documentation requirements apply.
Most AI agent workflows are payer-specific. Testing them requires patients with specific insurance configurations, not generic coverage resources with placeholder identifiers. This is also a major reason existing FHIR sandboxes fall short for agent development.
Missing resource types
Synthea focuses on clinical resources. It does not generate many of the resource types that agents interact with: DocumentReferences with actual attached documents, ServiceRequests for ordered procedures, ClaimResponses with denial reasons, CommunicationRequests for follow-up actions. These are the resources that prior auth and revenue cycle agents read and write most frequently.
What AI agents actually need
Healthcare AI agents need synthetic data that is purpose-built for their specific workflows. Here is what that looks like in practice.
Scenario-driven generation
Instead of generating a population and hoping the right patients exist, you start with the scenario and generate patients to fit it.
A scenario definition might look like this:
scenario: prior-auth-lumbar-fusion
patient:
age: 52
conditions:
- code: M51.16 # Lumbar disc degeneration
onset: 6 months ago
- code: M54.5 # Low back pain
onset: 8 months ago
procedures:
- code: 97110 # Physical therapy
date: 4 months ago
sessions: 12
imaging:
- code: 72148 # MRI lumbar spine
date: 2 months ago
findings: "Herniated disc at L4-L5 with neural foraminal stenosis"
medications:
- code: 197696 # Naproxen
start: 6 months ago
coverage:
payer: UnitedHealthcare
plan: Choice Plus PPO
memberId: "UHC123456789"From this definition, the system generates a complete FHIR bundle with all referenced resources, correct coding, proper references, and clinical coherence. The patient's Observations, DiagnosticReports, and Encounters all tell the same story.
Controlled variation
For population testing, you need variations on a scenario, not random patients. If you are testing a prior auth agent, you want:
- 50 patients who should be approved (clear clinical indication, documentation complete)
- 50 patients who should be denied (documentation gaps, alternative treatments not tried)
- 50 patients who should be pended (documentation partially complete, need additional info)
- 50 patients with edge cases (coverage changes, multiple payers, rare conditions)
Each group shares the same basic scenario but varies in specific ways that test different agent behaviors.
Vendor-specific output
The same patient should be expressible in multiple vendor data shapes:
- Epic-style: Contained Practitioner resources, Epic-specific extensions, references via
referenceanddisplaywith Epic's format. - Cerner-style: Inline practitioner references, Oracle Health extensions, their specific approach to medication representation.
- Generic US Core: Clean, spec-compliant data without vendor quirks.
This lets you test your agent against each vendor's data format without maintaining separate patient datasets.
Complete resource graphs
A synthetic patient for agent testing needs more than Conditions and Observations. They need:
- DocumentReferences with realistic clinical documents (operative notes, pathology reports, letters of medical necessity)
- ServiceRequests for ordered procedures with the correct codes and supporting information
- Coverage resources with real-looking plan identifiers and coverage details
- Claims and ClaimResponses for testing denial/appeal workflows
- QuestionnaireResponses for DTR testing with prior auth questionnaire data
The resource graph needs to be complete enough that an agent can execute its full workflow without hitting a missing reference.
Temporal consistency
Clinical data tells a story over time. The back pain started 8 months ago. Conservative treatment was tried for 4 months. The MRI was ordered after conservative treatment failed. The surgical referral came after the MRI findings.
Randomly generated data does not maintain this temporal consistency. An observation dated before the condition onset, or a medication started after the procedure it was supposed to treat, will confuse agents that rely on temporal reasoning.
Building scenario-specific synthetic data
The practical approach to building scenario-specific synthetic data combines structured templates with LLM-assisted generation:
- Define the scenario with the key clinical elements, coverage requirements, and expected outcomes.
- Generate the FHIR bundle using templates that ensure structural validity and correct references.
- Enrich with LLM assistance for free-text fields like clinical notes, radiology reports, and letters of medical necessity. The LLM generates text that is clinically coherent with the structured data.
- Validate against profiles to ensure US Core and any payer-specific profile conformance.
- Generate variations by modifying specific elements while maintaining clinical coherence.
This is not about replacing Synthea. Synthea is the right tool for population-level data generation. This is about adding a layer on top for scenario-specific, agent-oriented test data that Synthea cannot produce. And because this data is entirely synthetic, it eliminates the HIPAA compliance burden that comes with using real patient records for testing.
Key Takeaways
- Synthea handles breadth. Scenario-specific generation handles depth. Agent testing needs both.
- Probabilistic patient generation cannot reliably produce specific clinical configurations. Scenario definitions can.
- Vendor-shaped output (Epic, Cerner, Athena, generic US Core) catches parser failures before production.
- Complete resource graphs, including DocumentReference, ServiceRequest, Coverage, and Claim, are required for prior auth and revenue cycle agents.
- Temporal consistency across events matters for agents that reason over patient timelines.
- LLM-assisted generation enriches free-text fields (notes, reports, letters) while templates maintain structural validity.
FAQ
Can I get scenario-specific patients from Synthea?
Not directly. Synthea works from probabilistic modules. You can filter generated populations, but you cannot dictate specific combinations of conditions, procedures, coverage, and documentation. Scenario-driven tools build patients to match a defined clinical story.
Do I need vendor-shaped data if my target is a single EHR?
Yes. Even one EHR produces varied data shapes across different departments, integrations, and configurations. Testing against one shape hides assumptions that break on the next install.
What resource types matter most beyond the clinical basics?
For prior auth and revenue cycle agents: DocumentReference with real document bodies, ServiceRequest with correct codes, Coverage with specific plan identifiers, Claim and ClaimResponse for denial and appeal workflows, and QuestionnaireResponse for DTR flows.
How do LLMs fit into synthetic data generation?
LLMs generate the free-text pieces that templates cannot: clinical notes, radiology reports, letters of medical necessity. Templates maintain structural validity and coding. The combination produces data that is both syntactically correct and clinically readable.
Getting started
If you are building healthcare AI agents and need scenario-specific synthetic patients for testing, book a demo to see how Verial generates clinically coherent test data for your specific workflows.
Related articles
insightsHIMSS26's Agentic AI Gap Is an Eval Problem
HIMSS26 showed health systems deploying agents faster than they can audit them. The fix isn't more governance theater, it's independent simulation.
insightsThe Agent RFP: How Hospitals Should Evaluate AI in 2026
Slide decks and 3-month pilots can't tell you if an AI agent survives your workflows. Here's how the agent RFP replaces slideware with sim-based bakeoffs.