Synthea Alternatives for Healthcare AI Testing
Synthea generates population-level synthetic data. If you need scenario-specific patients, vendor-shaped FHIR bundles, or multi-interface test data, here are your options.
TL;DR
- Synthea is the best free, population-level synthetic data generator, but it falls short for AI agent testing. Agents need scenario-specific patients, vendor-shaped FHIR, and multi-interface data.
- The Synthea paper in JAMIA is one of the most cited synthetic data references in healthcare informatics (Walonoski et al., JAMIA 2018). ONC has adopted Synthea for patient-centered outcomes research (ONC).
- Over 90% of US hospitals run FHIR-capable EHRs (ONC). Each vendor shapes FHIR data differently, and Synthea does not reproduce those shapes.
- CMS synthetic data, MITRE test decks, Tonic.ai, MDClone, Gretel.ai, and scenario-driven generators each solve a different piece. Most real test strategies combine two or three.
This article focuses on the alternatives landscape and when to use each option. For a deeper look at what agent testing requires beyond Synthea, see Synthetic Patient Data: Beyond Synthea.
"FHIR is opening up new avenues of interoperability, such as exchange with patients through apps, and with organizations in other parts of the healthcare value chain such as health insurers, life insurers and life sciences companies."
Micky Tripathi, former National Coordinator for Health IT, ONC
Where Synthea falls short for AI agent testing
No scenario control
Synthea generates patients probabilistically. You cannot say "generate a 55-year-old diabetic woman who needs prior authorization for a continuous glucose monitor under Aetna." You can generate 5,000 patients and filter for ones that happen to have diabetes, but they will not have the specific clinical history, insurance configuration, and documentation state that your test scenario requires.
For AI agent testing, you need patients built for specific workflows, not random patients you hope will match. Your agent handles prior auth for knee replacements, or medication reconciliation at discharge, or claim denial appeals. Each scenario has specific data requirements that probabilistic generation cannot reliably produce.
No vendor-specific data shapes
Epic, Cerner, Athena, and other EHR vendors return FHIR data differently. Epic wraps practitioners in contained resources. Cerner uses proprietary extensions for clinical status fields. Athena structures medication resources differently than either.
Synthea generates clean, generic FHIR. This is a feature for interoperability testing, but a gap for agent testing. Your agent will encounter vendor-specific quirks in production, and Synthea data does not prepare it for those. See our FHIR R4 testing guide for specific examples of how vendor data shapes differ from spec-compliant FHIR.
Clinical coherence gaps in edge cases
Synthea's clinical modules handle common disease pathways well. Where it struggles is edge cases: patients with multiple interacting comorbidities, complex medication regimens with potential interactions, or rare conditions that fall outside the standard modules.
These edge cases are exactly what breaks AI agents. The patient with diabetes, CKD stage 3, and a new cancer diagnosis has medication constraints that interact in non-obvious ways. Synthea's modules operate independently, so the diabetes module does not know about the CKD module's impact on medication choices.
No multi-interface data
Healthcare AI agents do not interact with FHIR alone. A prior auth workflow might involve reading patient data from FHIR, calling a payer IVR system, submitting documents through a fax gateway, and navigating a payer portal. Synthea generates FHIR bundles. It does not generate the IVR call trees, fax templates, or portal configurations that your agent also needs to interact with.
If your agent spans multiple healthcare interfaces, Synthea covers one piece of the puzzle.
Sparse insurance and coverage data
Synthea generates basic Coverage resources, but they lack the detail that payer-specific workflows require. Real prior auth processes depend on specific plan identifiers, group numbers, subscriber IDs, and payer-specific policy rules. Synthea's coverage data is placeholder-level, not workflow-level.
The alternatives landscape
CMS Synthetic Data
The Centers for Medicare and Medicaid Services provides synthetic Medicare claims data (the DE-SynPUF dataset and newer versions). This data covers beneficiary demographics, inpatient claims, outpatient claims, prescription drug events, and carrier claims.
Best for: Teams working on Medicare-specific workflows, claims processing, or population health analytics. The data is structured like real CMS claims, which is valuable if your agent interacts with Medicare.
Limitations: Medicare-only, so no commercial payer data. Claims format, not FHIR (requires transformation). No clinical detail beyond what appears on claims. Older datasets may not reflect current coding practices.
MITRE Test Decks
Beyond Synthea, MITRE provides curated test datasets for specific use cases. These include test patients for Inferno (US Core conformance testing), Da Vinci implementation guides, and other FHIR implementation guide testing.
Best for: Teams implementing specific FHIR implementation guides who need patients that exercise particular profile requirements.
Limitations: Small datasets designed for conformance testing, not load testing or scenario coverage. The patients are crafted to validate specific profile elements, not to simulate realistic clinical scenarios.
Tonic.ai
Tonic.ai specializes in generating synthetic data from real data sources. It connects to your production database, learns the schema and statistical distributions, and generates synthetic data that preserves the structure and relationships of the original data while removing PHI.
Best for: Teams that have access to production clinical data and need de-identified synthetic versions for development and testing. Tonic maintains referential integrity and statistical properties.
Limitations: Requires access to real data to generate synthetic data, which creates a chicken-and-egg problem for early-stage companies. Focused on database-level synthesis, not FHIR-native output. Pricing is enterprise-oriented.
MDClone
MDClone takes a similar approach to Tonic but is healthcare-specific. It creates synthetic patient populations from real health system data, preserving clinical distributions and correlations while ensuring no individual patient can be re-identified.
Best for: Health systems and research organizations that want to share data externally without HIPAA risk. The synthetic data maintains clinical correlations that purely generative approaches miss.
Limitations: Requires a health system data partnership. Not available as a standalone tool for AI agent developers. Output is typically tabular, not FHIR-native. Focused on research and analytics, not agent testing workflows.
Gretel.ai
Gretel provides a general-purpose synthetic data platform that supports healthcare data types. It uses generative models to create synthetic datasets that preserve statistical properties of source data.
Best for: Teams that need synthetic versions of structured healthcare datasets (claims, encounters, labs) and have source data to train from.
Limitations: General-purpose, not healthcare-specialized. Does not generate FHIR bundles natively. Requires source data for training. Less control over individual patient scenarios than template-based approaches.
Verial
Verial generates scenario-driven synthetic patients with clinically coherent FHIR bundles, vendor-specific data shapes, and multi-interface test data. You define the clinical scenario, and Verial generates patients that match it.
Best for: Teams building AI agents that need scenario-specific test patients with particular conditions, medications, coverage, and clinical documentation. Teams that need the same patient expressed in different vendor data shapes or across multiple interfaces (FHIR, voice, fax, portal).
Limitations: Not designed for population-level data generation. If you need 100,000 patients for analytics, Synthea is still the better tool. Verial is purpose-built for agent testing, not general synthetic data needs.
Comparison table
| Feature | Synthea | CMS Synthetic | MITRE Decks | Tonic.ai | MDClone | Verial |
|---|---|---|---|---|---|---|
| Scenario control | None | None | Limited | Derived | Derived | Full |
| FHIR R4 output | Yes | No (claims) | Yes | No (DB) | No (tabular) | Yes |
| Vendor-specific shapes | No | N/A | No | No | No | Yes |
| Clinical coherence | Modules | Claims-based | Curated | Source-derived | Source-derived | Scenario-driven |
| Multi-interface data | FHIR only | Claims only | FHIR only | Source schema | Tabular | FHIR, voice, fax, portal |
| Requires source data | No | No | No | Yes | Yes | No |
| Population scale | Excellent | Large | Small | Large | Large | Moderate |
| Cost | Free | Free | Free | Enterprise | Enterprise | Paid |
| Open source | Yes | Yes | Yes | No | No | No |
When to use which
Use Synthea when you need population-level data for analytics, load testing, or basic FHIR integration development. Synthea is the right starting point for most healthcare data projects, and there is no reason not to use it for initial development.
Use CMS synthetic data when you are building Medicare-specific workflows or need realistic claims data structure. It is the best freely available source for claims-format data.
Use MITRE test decks when you are implementing a specific FHIR implementation guide and need patients that exercise its profile requirements.
Use Tonic.ai or MDClone when you have access to real production data and need synthetic versions that preserve its statistical properties. These tools are strongest when you need synthetic data that matches your specific patient population, not a generic population.
Use Verial when you are building AI agents that need to be tested against specific clinical scenarios with HIPAA-compliant synthetic data. If your agent handles prior auth, claims processing, or clinical documentation, it needs patients purpose-built for those workflows, not random patients from a population generator.
Combining approaches
These tools are not mutually exclusive. A practical testing strategy might look like this:
- Synthea for baseline testing. Generate a large patient population to validate basic FHIR parsing, search, and data handling.
- Scenario-specific patients for agent testing. Use Verial or hand-crafted patients for the specific clinical workflows your agent handles.
- Vendor-shaped data for deployment testing. Test against Epic-shaped, Cerner-shaped, and generic US Core data to catch vendor-specific parsing issues.
- Production-derived synthetic data for validation. If you have access, use Tonic or MDClone to generate data that matches your actual patient population distribution.
The key insight is that different testing phases require different data characteristics. Population data tests breadth. Scenario data tests depth. Vendor-shaped data tests resilience. Using only one approach leaves gaps that surface in production.
Key Takeaways
- Synthea is the right tool for population-level synthetic data. It is the wrong tool for scenario-specific agent testing.
- CMS synthetic data fits Medicare claims workflows. MITRE test decks fit profile conformance. Each solves a narrow slice.
- Tonic.ai and MDClone derive synthetic data from real data. Useful if you have production access, not if you don't.
- Gretel.ai is general-purpose. It works, but it does not produce FHIR natively.
- Scenario-driven generators produce the specific patient configurations agent testing requires, including vendor-shaped variants.
- A practical strategy combines Synthea for breadth with scenario-driven generation for the specific workflows your agent handles.
FAQ
Is Synthea still the best free option?
For population-level data, yes. Synthea is free, open source, clinically coherent, and produces valid FHIR R4. The JAMIA paper remains one of the most cited synthetic data references. For agent testing, it is a starting point, not an ending point.
Can I use CMS synthetic data with a FHIR pipeline?
Not directly. CMS synthetic data is claims-format, not FHIR. You have to transform it. If your agent works on claims rather than clinical data, it may be a better fit than Synthea.
How do Tonic.ai and MDClone differ?
Tonic is a general-purpose data synthesis platform with healthcare support. MDClone is healthcare-specific, works from health system data, and emphasizes clinical correlation preservation. Both require access to real source data.
When should I use a scenario-driven generator instead of Synthea?
When you need specific clinical configurations (specific conditions, payer plans, documentation states) for a workflow. Synthea's probabilistic generation cannot reliably produce those. Scenario-driven tools build the exact patient the test requires.
For a complete framework for testing healthcare AI agents, you need these layers working together.
Related articles
insightsHIMSS26's Agentic AI Gap Is an Eval Problem
HIMSS26 showed health systems deploying agents faster than they can audit them. The fix isn't more governance theater, it's independent simulation.
insightsThe Agent RFP: How Hospitals Should Evaluate AI in 2026
Slide decks and 3-month pilots can't tell you if an AI agent survives your workflows. Here's how the agent RFP replaces slideware with sim-based bakeoffs.