如何正确使用 tensorflow ctc 光束搜索？

Question

我想对（给出的 ASR 模型的输出）音素概率值矩阵执行 CTC Beam Search。 Tensorflow 有一个 CTC Beam Search 实现，但它的文档很差，我没有给出一个有效的例子。 我想编写一个代码来使用它作为基准。

到目前为止，这是我的代码：

import numpy as np
import tensorflow as tf

def decode_ctcBeam(matrix, classes):
      matrix = np.reshape(matrix, (matrix.shape[0], 1,matrix.shape[1]))
      aa_ctc_blank_aa_logits = tf.constant(matrix)
      sequence_length = tf.constant(np.array([len(matrix)], dtype=np.int32))

      (decoded_list,), log_probabilities = tf.nn.ctc_beam_search_decoder(inputs=aa_ctc_blank_aa_logits,
                                          sequence_length=sequence_length,
                                          merge_repeated=True,
                                          beam_width=25)

      out = list(tf.Session().run(tf.sparse_tensor_to_dense(decoded_list)[0]))    
      print(out)

      return out

if __name__ == '__main__':
    classes = ['AA', 'B', 'CH']
    mat = np.array([[0.4, 0, 0.6, 0.2], [0.4, 0, 0.6, 0.2]], dtype=np.float32)

    actual = decode_ctcBeam (mat, classes)

我在理解代码时遇到问题：

在本例中垫的形状（2,4），但tensorflow模块需要（2，1,4）的形状，所以我重塑垫与matrix = np.reshape(matrix, (matrix.shape[0], 1,matrix.shape[1]))但这在数学上意味着什么？ 垫子和矩阵是一样的吗？ 还是我在这里搞混了？ 中间的 1 是我理解的批量大小。
decode_ctcBeam函数返回一个列表，在示例中它给出了 [2]，这应该意味着来自定义类的“CH”。 如果我有一个更大的输入矩阵，假设有 40 个音素，我如何概括这一点并找到识别的音素序列？

期待您的回答/评论！ 谢谢！

Answer 1

该TF文件是错误的-用束宽1束搜索是不一样的贪婪解码（我创建了一个关于这个问题前一段时间）。

然后，您可以简单地使用 np.transpose 重新排序维度，而不是 np.reshape，然后使用 np.expand_dims 添加大小为 1 的批量大小的维度。

最后，关于 TF 波束搜索实现：是的，文档不是很好。 我在文本识别模型中使用了该实现，我将您指向与您相关的行：

创建 TF 波束搜索操作：注意 merge_repeated=False，因为 TF 的默认设置（为 True）对于 99.99999% 的所有相关用例没有意义。 只需按照传递参数的变量名称查看它们的外观，例如输入矩阵是 ctcIn3dTBC，它是 RNN 输出的转置版本
将波束搜索的输出转换为字符字符串：该操作返回一个稀疏张量列表，必须将其解码为字符字符串

Answer 2

所以，自从我提出这个问题以来，我已经取得了一些进展，但仍然没有弄清楚如何正确使用 Tensorflow 的 CTC Beam Search。 它接缝设置 top_paths = 1 和 beam_width = 1 确实在整数列表中返回贪婪搜索预期输出，可以很容易地转换为存储在classes 中的所需音素。 这种情况下的输出是：

- - - -贪婪的 - - - - -

输出整数列表

[1, 22, 39, 14, 32, 8]

['AE', 'N', ' ', 'G', 'UH', 'D']

在 Beam Search 的情况下，结果很糟糕

-------光束搜索----------

输出整数列表

[26, 19, 9, 28, 5, 0, 2, 31, 1, 22, 39, 14, 32, 20, 8, 16, 39, 30, 37, 8]

['P', 'K', 'DH', 'S', 'AY', 'AA', 'AH', 'TH', 'AE', 'N', ' ', 'G', 'UH ', 'L', 'D', 'IH', ' ', 'T', 'Z', 'D']

参考是“我很好”。 [1, 22, 39, 14, 32, 8]的列表在Beam搜索结果中，其他部分应该是替代根？ 这对我来说很可疑。 谁有想法？

import numpy as np
import tensorflow as tf
import Classes

def decode_ctcBeam(matrix, classes):  
    matrix = np.reshape(matrix, (matrix.shape[0], 1,matrix.shape[1]))
    aa_ctc_blank_aa_logits = tf.constant(matrix)
    sequence_length = tf.constant(np.array([len(matrix)], dtype=np.int32))
    
    (decoded_list,), log_probabilities = tf.nn.ctc_beam_search_decoder(inputs=aa_ctc_blank_aa_logits,
                                              sequence_length=sequence_length,
                                              merge_repeated=True,
                                              top_paths=1,
                                              beam_width=4)

    out = list(tf.Session().run(tf.sparse_tensor_to_dense(decoded_list)[0]))
    print("Output int list")
    print(out)
    seq_list = get_seq_from_list(out, classes)
    return seq_list
        
def decode_ctcgreedy(matrix, classes):
    
    matrix = np.reshape(matrix, (matrix.shape[0], 1,matrix.shape[1]))
    
    aa_ctc_blank_aa_logits = tf.constant(matrix)
    sequence_length = tf.constant(np.array([len(matrix)], dtype=np.int32))

    (decoded_list,), log_probabilities = tf.nn.ctc_beam_search_decoder(inputs=aa_ctc_blank_aa_logits,
                                              sequence_length=sequence_length,
                                              merge_repeated=True,
                                              top_paths=1,
                                              beam_width=1)

    out = list(tf.Session().run(tf.sparse_tensor_to_dense(decoded_list)[0]))
    print("Output int list")
    print(out)
    seq_list = get_seq_from_list(out, classes)
    
    return seq_list

def get_seq_from_list(int_list, classes):
    out_list = []
    for i in range(0, len(int_list)):        
        out_list.append(classes[int_list[i]])
        
    return out_list

if __name__ == '__main__':

    mat = np.load('../npy_files/a1003.npy')
    classes = Classes.get_classes()
    
    print("-------Greedy---------")
    actual = decode_ctcgreedy(mat, classes)
    print(actual)    
    
    print("\n-------Beam Search----------")
    actual = decode_ctcBeam(mat, classes)
    print(actual)

如何正确使用 tensorflow ctc 光束搜索？

问题描述

2 个解决方案

解决方案1
1 2019-12-20 20:18:18

解决方案2
0 2019-12-19 15:35:52

如何正确使用 tensorflow ctc 光束搜索？

问题描述

2 个解决方案

解决方案1 1 2019-12-20 20:18:18

解决方案2 0 2019-12-19 15:35:52

解决方案1
1 2019-12-20 20:18:18

解决方案2
0 2019-12-19 15:35:52