SFTP Claims Testing with Synthetic 837 and 835 Files
Revenue cycle AI agents process claims via SFTP. How to test 837 submissions and 835 remittance parsing with synthetic X12 data.
TL;DR
- SFTP still carries most US medical claims. Test revenue cycle agents with synthetic 837 submissions and 835 remittances covering denials, partial pays, and adjustments.
- CAQH's 2024 Index reports that 89% of claim submissions are now electronic, but the remaining manual and batch flows still drive billions in administrative cost.
- X12 reports that more than 7 billion healthcare transactions per year run through the X12 standards, with 837 and 835 as the workhorses.
- Test synthetic data must include CARC/RARC denial codes, PLB segments, HL hierarchy edge cases, and realistic acknowledgment flows (999, 277CA) to catch the bugs that matter.
The SFTP claims workflow
Despite decades of modernization, most US healthcare claims flow through SFTP. The process:
- A provider or billing system generates an 837 (the claim) in X12 EDI format
- The file is uploaded to a clearinghouse or payer SFTP endpoint
- The clearinghouse validates and forwards to the payer
- The payer adjudicates
- The payer generates an 835 (the remittance advice) with payment and adjustment detail
- The 835 is deposited in the provider's SFTP inbox
AI agents in revenue cycle participate in this workflow. Some generate 837s from clinical documentation. Others parse 835s to post payments, identify denials, and flag underpayments. Many do both. Testing these agents requires synthetic data and an SFTP endpoint that simulates clearinghouse behavior.
"Full automation of medical industry transactions could save $20 billion annually. The technology exists. Adoption does not."
-- CAQH, 2024 Index Report
X12 file structure basics
X12 EDI predates JSON, XML, and CSV. It is a positional, delimiter-based format:
ISA*00* *00* *ZZ*SENDER_ID *ZZ*RECEIVER_ID *260323*1430*^*00501*000000001*0*P*:~
GS*HC*SENDER_ID*RECEIVER_ID*20260323*1430*1*X*005010X222A1~
ST*837*0001*005010X222A1~
BHT*0019*00*12345*20260323*1430*CH~
NM1*41*2*VERIAL MEDICAL GROUP*****46*123456789~
PER*IC*JANE SMITH*TE*3125551234~
NM1*40*2*CLEARINGHOUSE INC*****46*987654321~
HL*1**20*1~
NM1*85*2*VERIAL MEDICAL GROUP*****XX*1234567890~
N3*100 MAIN ST~
N4*CHICAGO*IL*60601~
REF*EI*123456789~
HL*2*1*22*0~
SBR*P*18*GROUP123******CI~
NM1*IL*1*JOHNSON*ROBERT*A***MI*MEM987654~
DMG*D8*19651204*M~
NM1*PR*2*BLUE CROSS BLUE SHIELD*****PI*BCBS001~
CLM*CLAIM001*250***11:B:1*Y*A*Y*Y~
HI*ABK:R0602~
SV1*HC:99213*250*UN*1***1~
SE*25*0001~
GE*1*1~
IEA*1*000000001~
Structure hierarchy
ISA/IEA (Interchange). Outermost envelope. Sender/receiver IDs, date/time, control numbers. The ISA segment is always exactly 106 characters.
GS/GE (Functional Group). Groups related transactions. For claims, the functional identifier is HC.
ST/SE (Transaction Set). Individual transactions. For professional claims, 837P. For institutional, 837I.
Segments within the transaction. Follow the X12 005010X222A1 implementation guide and use HL hierarchical levels to organize billing provider, subscriber, patient, and claim data.
Delimiters
X12 uses three configurable delimiters defined in the ISA:
- Element separator:
*(ISA position 4) - Sub-element separator:
:(ISA position 105) - Segment terminator:
~(ISA position 106)
Your parser should read the delimiters from ISA rather than hardcoding them.
Common parsing challenges
Segment terminators and line breaks
Some systems place each segment on a new line. Others concatenate everything. Your parser should split on the segment terminator, not newlines.
Loop structures
X12 uses hierarchical loops. In an 837, the 2000A loop contains the billing provider, 2000B the subscriber, and 2300 the claim. Loops are identified by the HL hierarchical level code, not by explicit markers. Your parser must track loop state based on HL values.
This is where most parsing bugs live. Multiple claims for multiple patients create nested loops:
HL*1**20*1 (Billing Provider - 2000A)
HL*2*1*22*0 (Subscriber - 2000B)
CLM*... (Claim 1 for subscriber 2)
CLM*... (Claim 2 for subscriber 2)
HL*3*1*22*0 (Subscriber - 2000B, second patient)
CLM*... (Claim 1 for subscriber 3)
Misattribute HL hierarchy and you misattribute claims.
Adjustment reason codes
The 835 uses CAS (Claim Adjustment Segment) segments:
CAS*CO*45*50.00~
CAS*PR*1*20.00~
CAS*OA*23*10.00~
- CO (Contractual Obligation): provider write-off per contract
- PR (Patient Responsibility): patient owes (deductible, copay, coinsurance)
- OA (Other Adjustment): miscellaneous
- PI (Payer Initiated Reduction): payer-initiated, not contracted
CARC 45 is "Charge exceeds fee schedule." CARC 1 is "Deductible amount." There are hundreds of CARC and RARC codes your agent must interpret correctly.
For agents doing denial management or underpayment detection, correct code interpretation is the core functionality. This is also where prior auth agent failure modes intersect with claims: a CO-197 denial traces back to a failure earlier in the workflow.
Building synthetic 837 files
Cover these categories:
Clean claims
- 837P with single service line, valid CPT, valid ICD-10, complete patient/provider data
- 837I with revenue codes, service dates, correct type of bill
- Multiple service lines on a single claim
- Multiple claims for different patients in one file
Claims with common errors
- Missing or invalid NPI
- Invalid CPT code or CPT/diagnosis mismatch
- Service date in the future
- Duplicate claim
- Missing subscriber information
- Invalid member ID format for the target payer
Claims requiring special handling
- Claims with modifiers (25, 59, 76) affecting reimbursement
- Claims with prior auth numbers (REF segment, qualifier G1)
- Coordination of benefits (secondary payer)
- Out-of-network services
- NDC for J-codes
Building synthetic 835 files
Paid claims
- Full payment at expected rate
- Payment with CO-45 contractual adjustment
- Payment with PR-1 deductible, PR-2 coinsurance, PR-3 copay
- Multiple service lines with different payment amounts
Denied claims
- CO-197 missing prior authorization
- CO-96 non-covered service
- OA-18 duplicate claim
- CO-29 timely filing exceeded
- CO-4 denial requiring corrected claim
- Partial denial (some lines paid, others denied)
Adjustments and recoupments
- Overpayment recoupment (negative payment amount)
- Adjusted claim replacing previous payment
- Interest payment on late remittance
- PLB (Provider Level Balance) bulk adjustments
Zero-pay remittances
- Forwarded to secondary payer (zero payment, OA-23)
- Zero pay with appeal rights
- Informational remittance for capitated services
SFTP endpoint simulation
Your test environment needs an SFTP server that mimics clearinghouse behavior.
Inbound directory structure. Real clearinghouses expect specific directories:
/outbound/837/ # Agent uploads 837 files
/inbound/835/ # Agent retrieves 835 files
/inbound/999/ # Acknowledgments
/inbound/277/ # Claim status responses
File naming conventions. Clearinghouses often require specific patterns (e.g., 837P_SENDERID_YYYYMMDD_SEQUENCE.edi). Your simulation should validate names and reject bad ones.
Acknowledgment generation. When your agent uploads an 837, the simulation generates a 999 confirming receipt. For syntax errors, the 999 reports specifics.
Processing delay simulation. Real claims take days. Support configurable delays between 837 upload and 835 availability.
Concurrent access. Your agent should handle multiple files, process in order, and avoid reprocessing.
Testing scenarios
End-to-end claim lifecycle
- Agent generates an 837 from clinical data
- Agent uploads to the outbound directory
- Simulation generates a 999 (accepted)
- After configured delay, simulation generates an 835
- Agent retrieves the 835
- Agent parses and posts payment/adjustment data
- Verify posted amounts match the 835
Error handling
- 837 with syntax errors, verify agent handles the 999 rejection
- 837 that gets a partial 835, verify agent processes both paid and denied lines
- SFTP timeout during upload, verify agent retries
- 835 with unexpected adjustment codes, verify agent logs them for review
Volume testing
- 100 claims in a single 837
- 50 individual 837 files in rapid succession
- 835 with 500+ claim payment records
- Verify no data loss
Regression testing
Keep a library of real 835 files (PHI removed) that caused past issues. Run on every build. Common cases:
- Unusual segment ordering
- Multiple CAS segments per service line
- Provider adjustment total off from sum of individual adjustments (rounding)
- PLB segments for bulk adjustments
Key Takeaways
- SFTP remains the dominant claims transport. Agents must generate valid 837s and parse real-world 835s.
- Most parsing bugs live in HL hierarchy tracking and CAS adjustment interpretation.
- Test libraries should cover clean claims, common errors, special handling, paid, denied, and zero-pay 835 cases.
- SFTP simulation must include realistic 999 acknowledgments, file naming validation, and configurable processing delays.
- CARC and RARC codes drive revenue outcomes. Misinterpreting CO-197, CO-45, or OA-23 costs money.
- The X12 005010X222A1 IG is authoritative. Build your validator against it, not against example files.
- CAQH's 2024 Index estimates $20B in annual savings available from automating medical admin.
FAQ
Why is SFTP still the dominant transport for claims?
Trading partner agreements, existing clearinghouse contracts, and decades of integrated billing systems make switching costly. Newer FHIR-based flows (CMS-0057 PAS) exist but apply only to prior auth, not the full claim lifecycle.
What is the difference between 999 and 277CA acknowledgments?
A 999 is an Implementation Acknowledgment confirming the file parsed correctly. A 277CA is a Claim Acknowledgment from the payer confirming that individual claims passed initial validation. A file can get a clean 999 but still have claims rejected at the 277CA level.
How do I handle CAS segments that do not balance?
CAS totals must equal the difference between billed and paid amounts. When they do not (rounding, payer bug), log the discrepancy and flag for review. Do not silently force the math. You need the audit trail.
Should my agent post payments from 835s automatically?
For clean paid claims, yes. For denials, partial pays, or any CAS reason code outside a trusted allowlist, route to a human. Agents that auto-post everything cost more in corrections than they save in throughput.
Putting it all together
A complete claims testing setup includes:
- Synthetic data generator producing valid 837 and 835 files from configurable templates
- SFTP server with the right directory structure and acknowledgment generation
- Scenario runner orchestrating the claim lifecycle from submission to remittance
- Evaluation engine comparing agent output against expected results
- Regression suite with edge-case files
Claims are where revenue cycle AI agents handle real money. An agent that misparses a CAS segment can cost a practice thousands per month in missed revenue. Thorough testing is foundational. Claims testing is one part of a healthcare AI testing strategy that should also cover FHIR integration and HL7v2 feeds.
Related articles
insightsHIMSS26's Agentic AI Gap Is an Eval Problem
HIMSS26 showed health systems deploying agents faster than they can audit them. The fix isn't more governance theater, it's independent simulation.
insightsThe Agent RFP: How Hospitals Should Evaluate AI in 2026
Slide decks and 3-month pilots can't tell you if an AI agent survives your workflows. Here's how the agent RFP replaces slideware with sim-based bakeoffs.