SFTP Claims Testing with Synthetic 837 and 835 Files

TL;DR

SFTP still carries most US medical claims. Test revenue cycle agents with synthetic 837 submissions and 835 remittances covering denials, partial pays, and adjustments.
CAQH's 2024 Index reports that 89% of claim submissions are now electronic, but the remaining manual and batch flows still drive billions in administrative cost.
X12 reports that more than 7 billion healthcare transactions per year run through the X12 standards, with 837 and 835 as the workhorses.
Test synthetic data must include CARC/RARC denial codes, PLB segments, HL hierarchy edge cases, and realistic acknowledgment flows (999, 277CA) to catch the bugs that matter.

The SFTP claims workflow

Despite decades of modernization, most US healthcare claims flow through SFTP. The process:

A provider or billing system generates an 837 (the claim) in X12 EDI format
The file is uploaded to a clearinghouse or payer SFTP endpoint
The clearinghouse validates and forwards to the payer
The payer adjudicates
The payer generates an 835 (the remittance advice) with payment and adjustment detail
The 835 is deposited in the provider's SFTP inbox

AI agents in revenue cycle participate in this workflow. Some generate 837s from clinical documentation. Others parse 835s to post payments, identify denials, and flag underpayments. Many do both. Testing these agents requires synthetic data and an SFTP endpoint that simulates clearinghouse behavior.

"Full automation of medical industry transactions could save $20 billion annually. The technology exists. Adoption does not."

-- CAQH, 2024 Index Report

X12 file structure basics

X12 EDI predates JSON, XML, and CSV. It is a positional, delimiter-based format:

ISA*00*          *00*          *ZZ*SENDER_ID      *ZZ*RECEIVER_ID    *260323*1430*^*00501*000000001*0*P*:~
GS*HC*SENDER_ID*RECEIVER_ID*20260323*1430*1*X*005010X222A1~
ST*837*0001*005010X222A1~
BHT*0019*00*12345*20260323*1430*CH~
NM1*41*2*VERIAL MEDICAL GROUP*****46*123456789~
PER*IC*JANE SMITH*TE*3125551234~
NM1*40*2*CLEARINGHOUSE INC*****46*987654321~
HL*1**20*1~
NM1*85*2*VERIAL MEDICAL GROUP*****XX*1234567890~
N3*100 MAIN ST~
N4*CHICAGO*IL*60601~
REF*EI*123456789~
HL*2*1*22*0~
SBR*P*18*GROUP123******CI~
NM1*IL*1*JOHNSON*ROBERT*A***MI*MEM987654~
DMG*D8*19651204*M~
NM1*PR*2*BLUE CROSS BLUE SHIELD*****PI*BCBS001~
CLM*CLAIM001*250***11:B:1*Y*A*Y*Y~
HI*ABK:R0602~
SV1*HC:99213*250*UN*1***1~
SE*25*0001~
GE*1*1~
IEA*1*000000001~

Structure hierarchy

ISA/IEA (Interchange). Outermost envelope. Sender/receiver IDs, date/time, control numbers. The ISA segment is always exactly 106 characters.

GS/GE (Functional Group). Groups related transactions. For claims, the functional identifier is HC.

ST/SE (Transaction Set). Individual transactions. For professional claims, 837P. For institutional, 837I.

Segments within the transaction. Follow the X12 005010X222A1 implementation guide and use HL hierarchical levels to organize billing provider, subscriber, patient, and claim data.

Delimiters

X12 uses three configurable delimiters defined in the ISA:

Element separator: * (ISA position 4)
Sub-element separator: : (ISA position 105)
Segment terminator: ~ (ISA position 106)

Your parser should read the delimiters from ISA rather than hardcoding them.

Common parsing challenges

Segment terminators and line breaks

Some systems place each segment on a new line. Others concatenate everything. Your parser should split on the segment terminator, not newlines.

Loop structures

X12 uses hierarchical loops. In an 837, the 2000A loop contains the billing provider, 2000B the subscriber, and 2300 the claim. Loops are identified by the HL hierarchical level code, not by explicit markers. Your parser must track loop state based on HL values.

This is where most parsing bugs live. Multiple claims for multiple patients create nested loops:

HL*1**20*1     (Billing Provider - 2000A)
  HL*2*1*22*0  (Subscriber - 2000B)
    CLM*...    (Claim 1 for subscriber 2)
    CLM*...    (Claim 2 for subscriber 2)
  HL*3*1*22*0  (Subscriber - 2000B, second patient)
    CLM*...    (Claim 1 for subscriber 3)

Misattribute HL hierarchy and you misattribute claims.

Adjustment reason codes

The 835 uses CAS (Claim Adjustment Segment) segments:

CAS*CO*45*50.00~
CAS*PR*1*20.00~
CAS*OA*23*10.00~

CO (Contractual Obligation): provider write-off per contract
PR (Patient Responsibility): patient owes (deductible, copay, coinsurance)
OA (Other Adjustment): miscellaneous
PI (Payer Initiated Reduction): payer-initiated, not contracted

CARC 45 is "Charge exceeds fee schedule." CARC 1 is "Deductible amount." There are hundreds of CARC and RARC codes your agent must interpret correctly.

For agents doing denial management or underpayment detection, correct code interpretation is the core functionality. This is also where prior auth agent failure modes intersect with claims: a CO-197 denial traces back to a failure earlier in the workflow.

Building synthetic 837 files

Cover these categories:

Clean claims

837P with single service line, valid CPT, valid ICD-10, complete patient/provider data
837I with revenue codes, service dates, correct type of bill
Multiple service lines on a single claim
Multiple claims for different patients in one file

Claims with common errors

Missing or invalid NPI
Invalid CPT code or CPT/diagnosis mismatch
Service date in the future
Duplicate claim
Missing subscriber information
Invalid member ID format for the target payer

Claims requiring special handling

Claims with modifiers (25, 59, 76) affecting reimbursement
Claims with prior auth numbers (REF segment, qualifier G1)
Coordination of benefits (secondary payer)
Out-of-network services
NDC for J-codes

Building synthetic 835 files

Paid claims

Full payment at expected rate
Payment with CO-45 contractual adjustment
Payment with PR-1 deductible, PR-2 coinsurance, PR-3 copay
Multiple service lines with different payment amounts

Denied claims

CO-197 missing prior authorization
CO-96 non-covered service
OA-18 duplicate claim
CO-29 timely filing exceeded
CO-4 denial requiring corrected claim
Partial denial (some lines paid, others denied)

Adjustments and recoupments

Overpayment recoupment (negative payment amount)
Adjusted claim replacing previous payment
Interest payment on late remittance
PLB (Provider Level Balance) bulk adjustments

Zero-pay remittances

Forwarded to secondary payer (zero payment, OA-23)
Zero pay with appeal rights
Informational remittance for capitated services

SFTP endpoint simulation

Your test environment needs an SFTP server that mimics clearinghouse behavior.

Inbound directory structure. Real clearinghouses expect specific directories:

/outbound/837/       # Agent uploads 837 files
/inbound/835/        # Agent retrieves 835 files
/inbound/999/        # Acknowledgments
/inbound/277/        # Claim status responses

File naming conventions. Clearinghouses often require specific patterns (e.g., 837P_SENDERID_YYYYMMDD_SEQUENCE.edi). Your simulation should validate names and reject bad ones.

Acknowledgment generation. When your agent uploads an 837, the simulation generates a 999 confirming receipt. For syntax errors, the 999 reports specifics.

Processing delay simulation. Real claims take days. Support configurable delays between 837 upload and 835 availability.

Concurrent access. Your agent should handle multiple files, process in order, and avoid reprocessing.

Testing scenarios

End-to-end claim lifecycle

Agent generates an 837 from clinical data
Agent uploads to the outbound directory
Simulation generates a 999 (accepted)
After configured delay, simulation generates an 835
Agent retrieves the 835
Agent parses and posts payment/adjustment data
Verify posted amounts match the 835

Error handling

837 with syntax errors, verify agent handles the 999 rejection
837 that gets a partial 835, verify agent processes both paid and denied lines
SFTP timeout during upload, verify agent retries
835 with unexpected adjustment codes, verify agent logs them for review

Volume testing

100 claims in a single 837
50 individual 837 files in rapid succession
835 with 500+ claim payment records
Verify no data loss

Regression testing

Keep a library of real 835 files (PHI removed) that caused past issues. Run on every build. Common cases:

Unusual segment ordering
Multiple CAS segments per service line
Provider adjustment total off from sum of individual adjustments (rounding)
PLB segments for bulk adjustments

Key Takeaways

SFTP remains the dominant claims transport. Agents must generate valid 837s and parse real-world 835s.
Most parsing bugs live in HL hierarchy tracking and CAS adjustment interpretation.
Test libraries should cover clean claims, common errors, special handling, paid, denied, and zero-pay 835 cases.
SFTP simulation must include realistic 999 acknowledgments, file naming validation, and configurable processing delays.
CARC and RARC codes drive revenue outcomes. Misinterpreting CO-197, CO-45, or OA-23 costs money.
The X12 005010X222A1 IG is authoritative. Build your validator against it, not against example files.
CAQH's 2024 Index estimates $20B in annual savings available from automating medical admin.

FAQ

Why is SFTP still the dominant transport for claims?

Trading partner agreements, existing clearinghouse contracts, and decades of integrated billing systems make switching costly. Newer FHIR-based flows (CMS-0057 PAS) exist but apply only to prior auth, not the full claim lifecycle.

What is the difference between 999 and 277CA acknowledgments?

A 999 is an Implementation Acknowledgment confirming the file parsed correctly. A 277CA is a Claim Acknowledgment from the payer confirming that individual claims passed initial validation. A file can get a clean 999 but still have claims rejected at the 277CA level.

How do I handle CAS segments that do not balance?

CAS totals must equal the difference between billed and paid amounts. When they do not (rounding, payer bug), log the discrepancy and flag for review. Do not silently force the math. You need the audit trail.

Should my agent post payments from 835s automatically?

For clean paid claims, yes. For denials, partial pays, or any CAS reason code outside a trusted allowlist, route to a human. Agents that auto-post everything cost more in corrections than they save in throughput.

Putting it all together

A complete claims testing setup includes:

Synthetic data generator producing valid 837 and 835 files from configurable templates
SFTP server with the right directory structure and acknowledgment generation
Scenario runner orchestrating the claim lifecycle from submission to remittance
Evaluation engine comparing agent output against expected results
Regression suite with edge-case files

Claims are where revenue cycle AI agents handle real money. An agent that misparses a CAS segment can cost a practice thousands per month in missed revenue. Thorough testing is foundational. Claims testing is one part of a healthcare AI testing strategy that should also cover FHIR integration and HL7v2 feeds.