How to classify digits and words in Handwritten Text recognition

Question

I am working with handwritten text recognition using neural nets, thinks i have used in this

opencv for image processing
page segmentation and extracting text boxes
word segmentation ( or any suggestions for better accurate segmentation/mask the text )

i have a form with name, age and date of birth text boxes the result for the name field is good( able to recognize ) but for the DOB and age it is not able to recognize it completely as digits some of digits like '1' and '0' are recognized as 'i' and 'o'.

how can i classify this into words and digits, or can i use any other models for digits only ( currently i have trained the NN with IAM dataset-words ), or any suggestion.

result : --i-16-16-
result : -i-i6-86-

Answer 1

You could train another NN, since digit recognition is not a relatively computational intense task.

Alternatively, if the document has a fixed format then you are aware where the age and dob is present. In that case while thresholding the output layer to decide the output, only do it for the neurons that represent numbers.

For example, say you have 5 numbers {'1','2','3','4','5'} and 5 alphabets {'a','e','i','o','u'} . The output layer of your trained NN gives, [0.38, 0.006, 0.01, 0.004, 0.1, 0.03, 0.009, 0.4, 0.001, 0.06] .

On which you perform softmax to obtain a probabilistic interpretation. And select one output. Instead perform softmax only on the neurons representing numbers. You can also think of it as prior probability being zero.

Here i has a higher activation than 1 . But when performing softmax you only pick the neurons that represent numbers. Hence you get 1 .

How to classify digits and words in Handwritten Text recognition

Question

1 answers

solution1
0 2020-01-14 05:58:57

How to classify digits and words in Handwritten Text recognition

Question

1 answers

solution1 0 2020-01-14 05:58:57

solution1
0 2020-01-14 05:58:57