简体   繁体   English

如何将langdetect的语言概率向量添加到Keras Sequential Model?

[英]How to add langdetect's language probability vector to a Keras Sequential Model?

I'm currently studying the singing language identification problem (and the basics of machine learning).我目前正在研究歌唱语言识别问题(以及机器学习的基础知识)。 I found lots of works about this on the inte.net, but some of them don't provide any code (or even pseudocode) and that's why I'm trying to reproduce them using their machine learning model description.我在 inte.net 上发现了很多关于此的作品,但其中一些不提供任何代码(甚至伪代码),这就是为什么我试图使用他们的机器学习 model 描述来重现它们。

A good example is LISTEN, READ, AND IDENTIFY: MULTIMODAL SINGING LANGUAGE IDENTIFICATION OF MUSIC written by Keunwoo Choi and Yuxuan Wang.一个很好的例子是 Keunwoo Choi 和 Yuxuan Wang 写的LISTEN, READ, AND IDENTIFY: MULTIMODAL SINGING LANGUAGE IDENTIFICATION OF MUSIC

To sum up, they are concatenating two layers: audio layer (in form of spectrogram), text layer (language probability vector on metadata using langdetect, 56-dimensional vector).总而言之,它们连接了两层:音频层(以频谱图的形式)、文本层(使用 langdetect 的元数据上的语言概率向量,56 维向量)。

The text branch is a 3-layer MLP where each layer consists of a 128-unit fully-connected layer, a batch normalization layer, and a ReLU activation [22].

For text model I got something like this:对于文本 model,我得到了这样的信息:

text_model = Sequential()
text_model.add(Input((56,), name='input'))
text_model.add(BatchNormalization())
text_model.add(Dense(128, activation='relu'))

langdetect.detect_langs(metadata) returns [de:0.8571399874707945, en:0.14285867860989504] . langdetect.detect_langs(metadata)返回[de:0.8571399874707945, en:0.14285867860989504]

I m not sure I've described my model correctly and I cannot understand how to put it properly (langdetect probability vector) into keras model.我不确定我是否正确描述了我的 model,我无法理解如何将它正确地(langdetect 概率向量)放入 keras model。

First, you need to transform the langdetect output into vector of a constant length.首先,您需要将langdetect output 转换为恒定长度的向量。 There are 55 languages in the library, therefore we need to create vector of length 55, where i-th element represents the probability of text coming from the i-th language.库中有 55 种语言,因此我们需要创建长度为 55 的向量,其中第 i 个元素表示文本来自第 i 种语言的概率。 You could do this like this:你可以这样做:

import tensorflow as tf

import numpy as np
import langdetect

langdetect.detector_factory.init_factory()
LANGUAGES_LIST = langdetect.detector_factory._factory.langlist

def get_probabilities_vector(text):
    
    predictions = langdetect.detect_langs(text)
    output = np.zeros(len(LANGUAGES_LIST))
    
    for p in predictions:
        output[LANGUAGES_LIST.index(p.lang)] = p.prob
        
    return tf.constant(output)

Then you need to create a model with multiple inputs.然后,您需要创建一个具有多个输入的 model。 This can be done using functional API , eg like this (change your inputs according to your use case):这可以使用功能 API来完成,例如像这样(根据您的用例更改您的输入):

def create_model():
    
    audio_input = tf.keras.Input(shape=(256,))
    langdetect_input = tf.keras.Input(shape=(55,))
    
    x = tf.keras.layers.concatenate([audio_input, langdetect_input])
    x = tf.keras.layers.Dense(128, activation='relu')(x)
    output = tf.keras.layers.Dense(55)(x)
    
    model = tf.keras.Model(
        inputs={
            'audio': audio_input,
            'text': langdetect_input
        },
        outputs=output)
        
    return model

Testing the model on some input:在某些输入上测试 model:

model = create_model()

audio_input = tf.constant(np.random.rand(256))
langdetect_input = get_probabilities_vector('This is just a test input')

model({
    'audio': tf.expand_dims(audio_input, 0),
    'text': tf.expand_dims(langdetect_input, 0)
})

>>> <tf.Tensor: shape=(1, 55), dtype=float32, numpy=
array([[ 0.23361185,  0.19011918, -0.45230836, -0.0602392 , -0.20067683,
         0.9698535 , -1.0724173 ,  0.08978442,  0.052798  , -0.16554174,
         0.9238764 ,  1.0331644 ,  0.4508734 , -0.2450786 , -1.0605856 ,
         0.3239496 , -1.0073977 , -0.2129285 , -0.6817296 ,  0.05288622,
         0.9089616 , -0.11521344,  0.25696573, -0.07688305, -0.36123943,
        -0.0317415 , -0.18303779,  0.13786468,  0.88620317,  0.11393422,
        -0.5215691 , -0.28585738,  0.54988045, -0.02300271, -0.4347821 ,
        -0.57744324,  0.14031887,  0.8255624 , -0.13157232, -1.1060234 ,
        -0.24097277,  0.12950295,  0.4586677 ,  0.37702668,  0.7558856 ,
        -0.05933011,  0.53903174,  0.27433476, -0.18464057,  1.0673125 ,
        -0.05723387, -0.03429477,  0.4431308 , -0.14510366, -0.28087378]],
      dtype=float32)>

I am expanding the dimensions of the inputs using expand_dims function so that the inputs have shapes (1, 256) and (1, 55) (which is similar to inputs (batch_size, 256) and (batch_size, 55) that the model expects during training).我正在使用expand_dims function 扩展输入的维度,以便输入具有形状(1, 256)(1, 55) (类似于输入(batch_size, 256)(batch_size, 55) model 在训练)。

This is just a draft, but this is roughly how your problem could be solved.这只是一个草稿,但大致可以解决您的问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM