Migrating a 40-year-old Clipper ERP: the orphan invoice rows weren't a bug

First-pass reconciliation showed tens of thousands of invoice detail and cash receipt rows pointing to invoice headers that did not exist. 41,095 orphan invoice lines. 4,160 orphan cash receipts.

That was the worst kind of bad number: large, structural, and not obviously from one cause.

We were migrating a 40-year-old Clipper ERP to Delphi + PostgreSQL. On paper: 570 .PRG files, four decades of DBF files, millions of rows. In live operations: 7 programs carried most of the business. That was the first surprise.

The second came from the orphan rows.

Phase 1: I caused this

We treated it as a migration defect.

The orphan check is the kind of query you write once and use forever:

SELECT COUNT(*)
FROM invoice_lines il
WHERE NOT EXISTS (
    SELECT 1 FROM invoices i
    WHERE i.inv_no = il.inv_no
);
-- 41,095

Out of 3,012,516 total invoice lines, that's 1.36%. The cash receipts equivalent returned 4,160 orphans out of 350,957 receipts, or 1.19%. The dollar impact: $530,833.66 of orphan line ext_price (0.42% of the $124.5M total) and $341,775.36 of orphan cash paid_amt (0.30% of $115.4M total). Small as a percentage. Large as an absolute number you have to explain.

So we checked everything in our pipeline:

Key trimming and padding (Clipper indexes are picky about trailing spaces)
Import order (lines before headers? headers before lines?)
File mappings (had we silently dropped a column?)
Skipped or deleted DBF records (Clipper marks deletes with a flag, not a real delete)
Reran reconciliation after each fix

By day three I had a dead spreadsheet of burned hypotheses and no culprit.

Phase 2: The old vendor caused this

When your own pipeline survives inspection, you start looking at whoever built it.

I assumed we'd find a classic predecessor failure: botched archives, partial retention, or one decade where someone cut corners and never documented it. The orphan headers had to exist somewhere. A working ERP doesn't ship invoices, collect cash, and post journal entries against headers that aren't there.

So we went to archive forensics. The Clipper system uses two-digit year suffixes on table names: ARMAST99 for the live invoice header file, ARTRAN99 for live invoice lines, ARCASH99 for live cash receipts, and (we assumed) ARMAST70, ARMAST80, ARMAST88, ARMAST98 as ten-year archive families going back to 1970. Four archive families. We expected the orphan headers to live there.

They didn't.

What we actually found:

ARMAST70	does not exist
ARMAST80	does not exist
ARMAST88	3 invoice headers
ARMAST98	1 invoice header
ARCASH98	10 cash receipts
ARTRAN98	24 invoice lines

Four archive families with effectively zero content. We imported every available archive source anyway. Counts barely moved (+4 invoices, +24 lines, +10 receipts). The orphan numbers stayed at 41,095 and 4,160.

I was running queries late at night when that result came back. That was the moment the story changed: we weren't failing to locate history; history had been intentionally shed.

Phase 3: this is how it stayed alive

The orphan set wasn't an incident. It was a long-running pattern: periodic header purges with surviving transactional detail.

Look at the inv_no range. The active ARMAST99 file's MIN(inv_no) is 298,417. Its MAX is 3,462,381. The orphan invoice lines reference inv_no values below 298,417 — gone, no archive trail — up to 3,045,913. The system had been periodically purging old invoice headers from the active table, presumably to keep the .DBF size manageable, without cascading the purge into the line and cash files. The transactional rows survived the headers.

Not clean. Not modern. But consistent with a system that kept shipping, invoicing, and collecting cash for four decades on hardware that started life in the 90s.

I caused this.
The old vendor caused this.
No. This is how the system stayed alive for 40 years.

The weird stuff wasn't random

From a greenfield lens, these all look like anti-patterns:

VOIDED MM/DD/YY embedded in operational fields (the cust_no field would carry a void marker)
Mixed date strings in the same column (MM/DD/YY and YYYYMMDD both valid depending on era)
Referential integrity enforced in application code, not foreign-key constraints
Denormalized balances maintained transactionally in customer master
Historical purges that reduce file pressure but leave partial trails

From a longevity lens, they read as adaptation. Each anti-pattern was a workaround for a constraint that no longer exists in modern systems — file-size limits, single-user index locks, expensive disk, no real backup — but the workaround stayed because the system kept working with it.

570 files, 7 that mattered

Usage mapping changed our migration strategy more than any code analysis did.

The codebase looked like a 570-file mountain. We grepped the source for menu hooks (Clipper menus call .PRG files by name), traced which programs got called from which menu items, and cross-referenced against last-modified dates and DBF access patterns. The business behaved like a 7-flow system wearing a 570-file costume:

Customer maintenance
Order entry
Invoicing (route + standard)
Cash application
Inventory adjustments
Purchase order receipt
End-of-day posting

That let us prioritize behavioral gravity over source volume. The other 563 files were either obsolete (decade-old reports nobody runs), one-off utilities (invoice voiding scripts, batch printers), abandoned features that never went live, or vendor experiments. Migrating them forward would have meant porting code nobody had touched in fifteen years.

What AI helped with (and what it didn't)

AI accelerated archaeology:

Scanning and summarizing the 570-file Clipper source surface
Mapping program / menu / DBF relationships
Extracting candidate business rules from imperative procedural code for validation against modern reimplementation
Scaffolding reconciliation queries and progress documents

It did not replace verification. Every conclusion was grounded in repeatable SQL, count parity, range checks, and archive evidence. The orphan finding came out of running the same query forty different ways, not out of an AI saying "looks like a purge pattern."

AI was a multiplier on hypothesis generation, not an oracle. The expensive part of legacy migration is not reading the code; it's the disciplined verification loop that catches the cases where the code's behavior diverges from your assumptions.

Decision point

We had two options:

Keep pretending we could restore data that no longer existed.
Quantify impact, document risk, and classify the condition explicitly.

We chose the second. The reconciliation result shipped as PASS-with-known-exceptions: 41,095 orphan invoice lines (1.36%, $530,833.66) and 4,160 orphan cash receipts (1.19%, $341,775.36) accepted as a permanent legacy data condition with zero operational impact on current business. The customers, the open AR, the active receipts — everything that touches today's operations — reconciled cleanly.

We stopped pretending we could fix data that no longer exists.

Joseph Sprei is the founder of Ask the Ledger, an on-premise ERP for wholesale distributors. The Clipper migration described here is one of several legacy modernizations he's worked on across 30+ years of building line-of-business software. If you're staring down a similar legacy migration, his contact form is open.