Synthea Alternatives for Healthcare AI Testing

TL;DR

Synthea is the best free, population-level synthetic data generator, but it falls short for AI agent testing. Agents need scenario-specific patients, vendor-shaped FHIR, and multi-interface data.
The Synthea paper in JAMIA is one of the most cited synthetic data references in healthcare informatics (Walonoski et al., JAMIA 2018). ONC has adopted Synthea for patient-centered outcomes research (ONC).
Over 90% of US hospitals run FHIR-capable EHRs (ONC). Each vendor shapes FHIR data differently, and Synthea does not reproduce those shapes.
CMS synthetic data, MITRE test decks, Tonic.ai, MDClone, Gretel.ai, and scenario-driven generators each solve a different piece. Most real test strategies combine two or three.

This article focuses on the alternatives landscape and when to use each option. For a deeper look at what agent testing requires beyond Synthea, see Synthetic Patient Data: Beyond Synthea.

"FHIR is opening up new avenues of interoperability, such as exchange with patients through apps, and with organizations in other parts of the healthcare value chain such as health insurers, life insurers and life sciences companies."

Micky Tripathi, former National Coordinator for Health IT, ONC

Where Synthea falls short for AI agent testing

Synthea generates patients probabilistically. You cannot say "generate a 55-year-old diabetic woman who needs prior authorization for a continuous glucose monitor under Aetna." You can generate 5,000 patients and filter for ones that happen to have diabetes, but they will not have the specific clinical history, insurance configuration, and documentation state that your test scenario requires.

For AI agent testing, you need patients built for specific workflows, not random patients you hope will match. Your agent handles prior auth for knee replacements, or medication reconciliation at discharge, or claim denial appeals. Each scenario has specific data requirements that probabilistic generation cannot reliably produce.

No vendor-specific data shapes

Epic, Cerner, Athena, and other EHR vendors return FHIR data differently. Epic wraps practitioners in contained resources. Cerner uses proprietary extensions for clinical status fields. Athena structures medication resources differently than either.

Synthea generates clean, generic FHIR. This is a feature for interoperability testing, but a gap for agent testing. Your agent will encounter vendor-specific quirks in production, and Synthea data does not prepare it for those. See our FHIR R4 testing guide for specific examples of how vendor data shapes differ from spec-compliant FHIR.

Clinical coherence gaps in edge cases

Synthea's clinical modules handle common disease pathways well. Where it struggles is edge cases: patients with multiple interacting comorbidities, complex medication regimens with potential interactions, or rare conditions that fall outside the standard modules.

These edge cases are exactly what breaks AI agents. The patient with diabetes, CKD stage 3, and a new cancer diagnosis has medication constraints that interact in non-obvious ways. Synthea's modules operate independently, so the diabetes module does not know about the CKD module's impact on medication choices.

No multi-interface data

Healthcare AI agents do not interact with FHIR alone. A prior auth workflow might involve reading patient data from FHIR, calling a payer IVR system, submitting documents through a fax gateway, and navigating a payer portal. Synthea generates FHIR bundles. It does not generate the IVR call trees, fax templates, or portal configurations that your agent also needs to interact with.

If your agent spans multiple healthcare interfaces, Synthea covers one piece of the puzzle.

Sparse insurance and coverage data

Synthea generates basic Coverage resources, but they lack the detail that payer-specific workflows require. Real prior auth processes depend on specific plan identifiers, group numbers, subscriber IDs, and payer-specific policy rules. Synthea's coverage data is placeholder-level, not workflow-level.

The alternatives landscape

CMS Synthetic Data

The Centers for Medicare and Medicaid Services provides synthetic Medicare claims data (the DE-SynPUF dataset and newer versions). This data covers beneficiary demographics, inpatient claims, outpatient claims, prescription drug events, and carrier claims.

Best for: Teams working on Medicare-specific workflows, claims processing, or population health analytics. The data is structured like real CMS claims, which is valuable if your agent interacts with Medicare.

Limitations: Medicare-only, so no commercial payer data. Claims format, not FHIR (requires transformation). No clinical detail beyond what appears on claims. Older datasets may not reflect current coding practices.

MITRE Test Decks

Beyond Synthea, MITRE provides curated test datasets for specific use cases. These include test patients for Inferno (US Core conformance testing), Da Vinci implementation guides, and other FHIR implementation guide testing.

Best for: Teams implementing specific FHIR implementation guides who need patients that exercise particular profile requirements.

Limitations: Small datasets designed for conformance testing, not load testing or scenario coverage. The patients are crafted to validate specific profile elements, not to simulate realistic clinical scenarios.

Tonic.ai

Tonic.ai specializes in generating synthetic data from real data sources. It connects to your production database, learns the schema and statistical distributions, and generates synthetic data that preserves the structure and relationships of the original data while removing PHI.

Best for: Teams that have access to production clinical data and need de-identified synthetic versions for development and testing. Tonic maintains referential integrity and statistical properties.

Limitations: Requires access to real data to generate synthetic data, which creates a chicken-and-egg problem for early-stage companies. Focused on database-level synthesis, not FHIR-native output. Pricing is enterprise-oriented.

MDClone

MDClone takes a similar approach to Tonic but is healthcare-specific. It creates synthetic patient populations from real health system data, preserving clinical distributions and correlations while ensuring no individual patient can be re-identified.

Best for: Health systems and research organizations that want to share data externally without HIPAA risk. The synthetic data maintains clinical correlations that purely generative approaches miss.

Limitations: Requires a health system data partnership. Not available as a standalone tool for AI agent developers. Output is typically tabular, not FHIR-native. Focused on research and analytics, not agent testing workflows.

Gretel.ai

Gretel provides a general-purpose synthetic data platform that supports healthcare data types. It uses generative models to create synthetic datasets that preserve statistical properties of source data.

Best for: Teams that need synthetic versions of structured healthcare datasets (claims, encounters, labs) and have source data to train from.

Limitations: General-purpose, not healthcare-specialized. Does not generate FHIR bundles natively. Requires source data for training. Less control over individual patient scenarios than template-based approaches.

Verial

Verial generates scenario-driven synthetic patients with clinically coherent FHIR bundles, vendor-specific data shapes, and multi-interface test data. You define the clinical scenario, and Verial generates patients that match it.

Best for: Teams building AI agents that need scenario-specific test patients with particular conditions, medications, coverage, and clinical documentation. Teams that need the same patient expressed in different vendor data shapes or across multiple interfaces (FHIR, voice, fax, portal).

Limitations: Not designed for population-level data generation. If you need 100,000 patients for analytics, Synthea is still the better tool. Verial is purpose-built for agent testing, not general synthetic data needs.

Comparison table

Feature	Synthea	CMS Synthetic	MITRE Decks	Tonic.ai	MDClone	Verial
Scenario control	None	None	Limited	Derived	Derived	Full
FHIR R4 output	Yes	No (claims)	Yes	No (DB)	No (tabular)	Yes
Vendor-specific shapes	No	N/A	No	No	No	Yes
Clinical coherence	Modules	Claims-based	Curated	Source-derived	Source-derived	Scenario-driven
Multi-interface data	FHIR only	Claims only	FHIR only	Source schema	Tabular	FHIR, voice, fax, portal
Requires source data	No	No	No	Yes	Yes	No
Population scale	Excellent	Large	Small	Large	Large	Moderate
Cost	Free	Free	Free	Enterprise	Enterprise	Paid
Open source	Yes	Yes	Yes	No	No	No

When to use which

Use Synthea when you need population-level data for analytics, load testing, or basic FHIR integration development. Synthea is the right starting point for most healthcare data projects, and there is no reason not to use it for initial development.

Use CMS synthetic data when you are building Medicare-specific workflows or need realistic claims data structure. It is the best freely available source for claims-format data.

Use MITRE test decks when you are implementing a specific FHIR implementation guide and need patients that exercise its profile requirements.

Use Tonic.ai or MDClone when you have access to real production data and need synthetic versions that preserve its statistical properties. These tools are strongest when you need synthetic data that matches your specific patient population, not a generic population.

Use Verial when you are building AI agents that need to be tested against specific clinical scenarios with HIPAA-compliant synthetic data. If your agent handles prior auth, claims processing, or clinical documentation, it needs patients purpose-built for those workflows, not random patients from a population generator.

Combining approaches

These tools are not mutually exclusive. A practical testing strategy might look like this:

Synthea for baseline testing. Generate a large patient population to validate basic FHIR parsing, search, and data handling.
Scenario-specific patients for agent testing. Use Verial or hand-crafted patients for the specific clinical workflows your agent handles.
Vendor-shaped data for deployment testing. Test against Epic-shaped, Cerner-shaped, and generic US Core data to catch vendor-specific parsing issues.
Production-derived synthetic data for validation. If you have access, use Tonic or MDClone to generate data that matches your actual patient population distribution.

The key insight is that different testing phases require different data characteristics. Population data tests breadth. Scenario data tests depth. Vendor-shaped data tests resilience. Using only one approach leaves gaps that surface in production.

Key Takeaways

Synthea is the right tool for population-level synthetic data. It is the wrong tool for scenario-specific agent testing.
CMS synthetic data fits Medicare claims workflows. MITRE test decks fit profile conformance. Each solves a narrow slice.
Tonic.ai and MDClone derive synthetic data from real data. Useful if you have production access, not if you don't.
Gretel.ai is general-purpose. It works, but it does not produce FHIR natively.
Scenario-driven generators produce the specific patient configurations agent testing requires, including vendor-shaped variants.
A practical strategy combines Synthea for breadth with scenario-driven generation for the specific workflows your agent handles.

Synthea Alternatives for Healthcare AI Testing

TL;DR

Where Synthea falls short for AI agent testing

No scenario control

No vendor-specific data shapes

Clinical coherence gaps in edge cases

No multi-interface data

Sparse insurance and coverage data

The alternatives landscape

CMS Synthetic Data

MITRE Test Decks

Tonic.ai

MDClone

Gretel.ai

Verial

Comparison table

When to use which

Combining approaches

Key Takeaways

FAQ

Is Synthea still the best free option?

Can I use CMS synthetic data with a FHIR pipeline?

How do Tonic.ai and MDClone differ?

When should I use a scenario-driven generator instead of Synthea?

Related articles

HIMSS26's Agentic AI Gap Is an Eval Problem

The Agent RFP: How Hospitals Should Evaluate AI in 2026