Wednesday, August 25, 2010

OCR for business cards' photos

I've been curious to try and do the character recognition on the photos of a few of the business cards that I made during the IETF (since otherwise the business cards turn into dust in my jeans' pockets - it was a much better option to make photos).

I've tried gocr, tesseract and the demo version of abbyyocr. The results were pretty much the same as I could find on the web otherwise: gocr was mediocre, tesseract was somehow promising (it could +/- reliably read name and surname - that were written in bold) - and the abbyyocr was actually half-bearable.

To the defence of the software, I must say that the images were far from being a typical OCR-food: shaded, distorted perspective - and the letters were a bit too small.

The interesting effect that I observed though, that when scaling the images *down* from the original to some ~80% the recognition quality increased in all the packages - with the abbyyocr getting to the point of being usable.

The time it took abbyyocr to do the recognition was notably longer. But, if I have to do OCR, among all three I will probably pick that one.

