简体   繁体   中英

How to classify digits and words in Handwritten Text recognition

I am working with handwritten text recognition using neural nets, thinks i have used in this

  • opencv for image processing
  • page segmentation and extracting text boxes
  • word segmentation ( or any suggestions for better accurate segmentation/mask the text )

i have a form with name, age and date of birth text boxes the result for the name field is good( able to recognize ) but for the DOB and age it is not able to recognize it completely as digits some of digits like '1' and '0' are recognized as 'i' and 'o'.

how can i classify this into words and digits, or can i use any other models for digits only ( currently i have trained the NN with IAM dataset-words ), or any suggestion.

示例图片

result : --i-16-16-
result : -i-i6-86-

You could train another NN, since digit recognition is not a relatively computational intense task.

Alternatively, if the document has a fixed format then you are aware where the age and dob is present. In that case while thresholding the output layer to decide the output, only do it for the neurons that represent numbers.

For example, say you have 5 numbers {'1','2','3','4','5'} and 5 alphabets {'a','e','i','o','u'} . The output layer of your trained NN gives, [0.38, 0.006, 0.01, 0.004, 0.1, 0.03, 0.009, 0.4, 0.001, 0.06] .

On which you perform softmax to obtain a probabilistic interpretation. And select one output. Instead perform softmax only on the neurons representing numbers. You can also think of it as prior probability being zero.

Here i has a higher activation than 1 . But when performing softmax you only pick the neurons that represent numbers. Hence you get 1 .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM