简体   繁体   中英

Preparing target label in TensorFlow (python) for CTC Loss

I am preparing a TensorFlow application for the purpose of Handwriting recognition. I am using a simple RNN model with stacked LSTM cells and CTC loss at the end. I have some confusion regarding preparation of Labels for input data.

Suppose I have three strings as target label as "abc" , "ab" and "baccc" (in my case the target labels are words not sentences). so I have three character classes as indexed a:0, b:1, c:2, blank:3 . As far as I understood the dense representation of the target label should be

0 3 1 3 2 0 0 0  
0 3 1 0 0 0 0 0  
1 3 0 2 3 2 3 2  

But as TensorFlow requires sparse representation of this label I need to prepare a sparse representation as

indices[(0,0),(0,1),(0,2),(0,3),(0,4),(1,0),(1,1),(1,2),(2,0),(2,1),(2,2),(2,3),(2,4),(2,5),(2,6),(2,7)]  
values[0,3,1,3,2,0,3,1,1,3,0,2,3,2,3,2]  
shape[3,8]  

Am I correct regarding this data preparation? Any help is highly appreciated.

If I am understanding this correctly, you are using a batch size of 3, and bucketing targets of different lengths together. For CTC loss I would recommend to utilize a batch of size 1, as CTC seems to have trouble converging on long sequences.

In the dense representation you seem to pad the shorter targets with 0's. Those should be 3's instead (blanks).

Lastly, the sparse tensor you're building seems to be correct to me. Are you having trouble with dimensionality? Do you have an error log to show?

from tensorflow website: https://www.tensorflow.org/api_docs/python/tf/nn/ctc_loss

The inputs Tensor's innermost dimension size, num_classes, represents num_labels + 1 classes, where num_labels is the number of true labels, and the largest value (num_classes - 1) is reserved for the blank label.

and

labels: An int32 SparseTensor. labels.indices[i, :] == [b, t] means labels.values[i] stores the id for (batch b, time t). labels.values[i] must take on values in [0, num_labels).

inputs: 3-D float Tensor. If time_major == False, this will be a Tensor shaped: [batch_size, max_time, num_classes]. If time_major == True (default), this will be a Tensor shaped: [max_time, batch_size, num_classes]. The logits.

you should not insert blank labels while creating first parameter "labels" of ctcloss. Each value must be in range[0,numOfTrueLabels). This is clearly given in the description.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM