【鼠】庚子年癸未月戊午日 / 五月廿五日
Tuesday July 14, 2020

Optical Character Recognition (OCR)

OCR stands for Optical Character Recognition and is the conversion of scanned images (i.e. handwritten, type written, printed text) into machine encoded (digital) text.

The clearer and larger the characters are, the better the system will recognise them. The problem with Chinese characters is that every character has to matched for recognition against thousands of individual characters (compared to less then 100 latin characters), who are rather complex in structure.
Chinese punctuation like a '。' (dot) can be misread as 'o' (letter o) or '0' (zero)

Google Drive has the ability to OCR uploaded PDF's and image files in Simplified & Traditional Chinese.
Microsoft Office has a smilar feature if you have the 'Document Imaging' function installed

[ < Home ]