简体   繁体   中英

Can Windows ocr recognize custom symbols/font?

I'm developing for UWP, Windows has an OCR engine: Windows.Media.Ocr

My question is: someone knows if the Windows OCR can be trained to recognize new characters or use a custom font? if yes, how i can do this?

what i want to achieve is to recognize non alphabetical symbols. I want to recognize per example the character ⌰ (unicode: U+2330) or ⌖ (U+2316).

The characters that i want to recognize are symbols not for any language.

I used Windows.Media.Ocr library in my WUP application and here some test result with different font


Arial

Font - Arial
Test Words - Hello @ World
Expected Result - Hello @ World
Original Result - Hello @ World
Accuracy - 100%

在此输入图像描述


Agency FB

Font - Agency FB
Test Words - Hello @ World
Expected Result - Hello @ World
Original Result - Hello World
Accuracy - 84.6% (Missed - @ symbol and one space)

在此输入图像描述


Modern

Font - Modern
Test Words - Hello @ World
Expected Result - Hello @ World
Original Result - Hello @ world
Accuracy - 92.3% (W recognised as w)

在此输入图像描述


Lucida Handwriting

Font - Lucida Handwriting Test Words - Hello @ World
Expected Result - Hello @ World
Original Result - HeUe@ worw
Accuracy - 46.1%
在此输入图像描述


Update [ 1 ]

Arial Unicode MS

Font - Arial Unicode MS
Test Symbols - ⌰ ⌖
Expected Result - ⌰ ⌖
Original Result - (Unable to Recognize)
Accuracy - 0%

在此输入图像描述


Update 2

在此输入图像描述

Hope this helpful to you.

I think a short answer to your question is no. As it is said in Supported languages sections in Windows.Media.Ocr namespace:

There are 25 supported languages. Based on recognition accuracy and performance, supported languages are divided into three groups:

  • Excellent: Czech, Danish, Dutch, English, Finnish, French, German, Hungarian, Italian, Norwegian, Polish, Portuguese, Romanian, Serbian Cyrillic, Serbian Latin, Slovak, Spanish and Swedish.
  • Very good: Chinese Simplified, Greek, Japanese, Russian and Turkish.
  • Good: Chinese Traditional and Korean.

The language is required information for correct text recognition. Every language uses some language-specific resources, so it must be specified in advance.

Note Only languages installed on the device can be used. A user can install new languages through the Settings app.

So if your symbols are not for any language, the OCR engine won't recognize it.

And for custom font, As Vineet Choudhary's answer shows, maybe the OCR engine can recognize some, the accuracy of text recognition depends on your font. If it's handwritten or cursive text, the accuracy of text recognition may be very low.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM