First-pass reconciliation showed tens of thousands of invoice detail and cash receipt rows pointing to invoice headers that did not exist. 41,095 orphan invoice lines. 4,160 orphan cash receipts.
That was the worst kind of bad number: large, structural, and not obviously from one cause.
We were migrating a 40-year-old Clipper ERP to Delphi + PostgreSQL. On paper: 570 .PRG files, four decades of DBF files, millions of rows. In live operations: 7 programs carried most of the business. That was the first surprise.
The second came from the orphan rows.
We treated it as a migration defect.
The orphan check is the kind of query you write once and use forever:
SELECT COUNT(*)
FROM invoice_lines il
WHERE NOT EXISTS (
SELECT 1 FROM invoices i
WHERE i.inv_no = il.inv_no
);
-- 41,095
Out of 3,012,516 total invoice lines, that's 1.36%. The cash receipts equivalent returned 4,160 orphans out of 350,957 receipts, or 1.19%. The dollar impact: $530,833.66 of orphan line ext_price (0.42% of the $124.5M total) and $341,775.36 of orphan cash paid_amt (0.30% of $115.4M total). Small as a percentage. Large as an absolute number you have to explain.
So we checked everything in our pipeline:
By day three I had a dead spreadsheet of burned hypotheses and no culprit.
When your own pipeline survives inspection, you start looking at whoever built it.
I assumed we'd find a classic predecessor failure: botched archives, partial retention, or one decade where someone cut corners and never documented it. The orphan headers had to exist somewhere. A working ERP doesn't ship invoices, collect cash, and post journal entries against headers that aren't there.
So we went to archive forensics. The Clipper system uses two-digit year suffixes on table names: ARMAST99 for the live invoice header file, ARTRAN99 for live invoice lines, ARCASH99 for live cash receipts, and (we assumed) ARMAST70, ARMAST80, ARMAST88, ARMAST98 as ten-year archive families going back to 1970. Four archive families. We expected the orphan headers to live there.
They didn't.
What we actually found:
| ARMAST70 | does not exist |
| ARMAST80 | does not exist |
| ARMAST88 | 3 invoice headers |
| ARMAST98 | 1 invoice header |
| ARCASH98 | 10 cash receipts |
| ARTRAN98 | 24 invoice lines |
Four archive families with effectively zero content. We imported every available archive source anyway. Counts barely moved (+4 invoices, +24 lines, +10 receipts). The orphan numbers stayed at 41,095 and 4,160.
I was running queries late at night when that result came back. That was the moment the story changed: we weren't failing to locate history; history had been intentionally shed.
The orphan set wasn't an incident. It was a long-running pattern: periodic header purges with surviving transactional detail.
Look at the inv_no range. The active ARMAST99 file's MIN(inv_no) is 298,417. Its MAX is 3,462,381. The orphan invoice lines reference inv_no values below 298,417 — gone, no archive trail — up to 3,045,913. The system had been periodically purging old invoice headers from the active table, presumably to keep the .DBF size manageable, without cascading the purge into the line and cash files. The transactional rows survived the headers.
Not clean. Not modern. But consistent with a system that kept shipping, invoicing, and collecting cash for four decades on hardware that started life in the 90s.
I caused this.
The old vendor caused this.
No. This is how the system stayed alive for 40 years.
From a greenfield lens, these all look like anti-patterns:
VOIDED MM/DD/YY embedded in operational fields (the cust_no field would carry a void marker)MM/DD/YY and YYYYMMDD both valid depending on era)From a longevity lens, they read as adaptation. Each anti-pattern was a workaround for a constraint that no longer exists in modern systems — file-size limits, single-user index locks, expensive disk, no real backup — but the workaround stayed because the system kept working with it.
Usage mapping changed our migration strategy more than any code analysis did.
The codebase looked like a 570-file mountain. We grepped the source for menu hooks (Clipper menus call .PRG files by name), traced which programs got called from which menu items, and cross-referenced against last-modified dates and DBF access patterns. The business behaved like a 7-flow system wearing a 570-file costume:
That let us prioritize behavioral gravity over source volume. The other 563 files were either obsolete (decade-old reports nobody runs), one-off utilities (invoice voiding scripts, batch printers), abandoned features that never went live, or vendor experiments. Migrating them forward would have meant porting code nobody had touched in fifteen years.
AI accelerated archaeology:
It did not replace verification. Every conclusion was grounded in repeatable SQL, count parity, range checks, and archive evidence. The orphan finding came out of running the same query forty different ways, not out of an AI saying "looks like a purge pattern."
AI was a multiplier on hypothesis generation, not an oracle. The expensive part of legacy migration is not reading the code; it's the disciplined verification loop that catches the cases where the code's behavior diverges from your assumptions.
We had two options:
We chose the second. The reconciliation result shipped as PASS-with-known-exceptions: 41,095 orphan invoice lines (1.36%, $530,833.66) and 4,160 orphan cash receipts (1.19%, $341,775.36) accepted as a permanent legacy data condition with zero operational impact on current business. The customers, the open AR, the active receipts — everything that touches today's operations — reconciled cleanly.
We stopped pretending we could fix data that no longer exists.
Joseph Sprei is the founder of Ask the Ledger, an on-premise ERP for wholesale distributors. The Clipper migration described here is one of several legacy modernizations he's worked on across 30+ years of building line-of-business software. If you're staring down a similar legacy migration, his contact form is open.