Hi John,
Well 2 OCR related questions on this forum in 2 hours … I wasn’t sure if I should reply because I’m not an OCR guru, but I’ll tell you what I know.
OCR is often quite flaky and the cleaner the image the better the results, I was surprised all your text came through with no incorrect characters, so your input image is quite clean
A few years back I performance tested a system that OCR’d invoices from external suppliers, so to test volumes of invoices we needed to generate PDF’s in our test script and submit them, then verify that the invoice generated in the system had the correct values.
The first thing we had to do was get some template supplier invoices because the OCR system had to be “trained” for each suppliers invoice, if the supplier changed their invoice, the system had to be retained.
The training was basically telling the OCR tool which region of the page to find which value, they didn’t OCR the whole page but rather OCR’d the region where the invoice number was, then OCR’d the region where the total was, etc.
In your case, for best results I’ll suggest, OCR the region where the column headings are, then OCR the region for row 1, etc if all goes well each OCR’d row should hopefully return 3 lines representing each column
Hope that helps,
Dave.