简体   繁体   English

LSTM的多个功能,多个类别,多个输出

[英]LSTM multiple features, multiple classes, multiple outputs

I'm trying to use a LSTM classifier to generate music based on some midi's that I have. 我正在尝试使用LSTM分类器根据我拥有的一些midi生成音乐。

The LSTM uses two features, the notes' pitch and the notes' duration. LSTM具有两个功能,即音符的音高和音符的持续时间。

For illustration, let's think we have: 为了说明,我们认为我们有:

  • Pitches: ["A", "B", "C"] 螺距:[“ A”,“ B”,“ C”]

  • Durations: ["0.5", "1", "1.5"] 持续时间:[“ 0.5”,“ 1”,“ 1.5”]

As you can imagine, a generated note has to have both pitch and duration. 可以想象,生成的音符必须同时具有音高和持续时间。

I tried to do it with a MultiLabelBinarizer. 我试图用MultiLabelBinarizer做到这一点。

from sklearn.preprocessing import MultiLabelBinarizer
labels = [[x,y] for x in all_pitches for y in all_durations]

mlb = MultiLabelBinarizer()
mlb_value = mlb.fit_transform(labels)

This divides the classes as intended, but the problem I'm having comes at the time of predictions. 这将按预期划分类,但是我遇到的问题是在预测时出现的。

prediction = model.predict_proba(prediction_input)

indexes = np.argsort(prediction, axis=None)[::-1]
index1 = indexes[0]
index2 = indexes[1]

result1 = mlb.classes_[index1]
result2 = mlb.classes_[index2]

I need the notes to have both pitch and duration, so this approach seems to not work for me (I only get the same two pitches all over). 我需要音符同时具有音高和持续时间,因此这种方法似乎对我不起作用(我只获得相同的两个音高)。

Another thing I thought was using a MultiOutputClassifier , but I seem unable to understand the differences of them, or how to actually use this MultiOutputClassifier correctly. 我认为的另一件事是使用MultiOutputClassifier ,但是我似乎无法理解它们的区别,或者如何正确地实际使用此MultiOutputClassifier

Thanks for the patience, and sorry for the probably stupid question. 感谢您的耐心配合,并为您提出的可能是愚蠢的问题感到抱歉。

You can feed your LSTM output into many different layers (or neural functions, in general), which lead to different outputs, and then train your model on each of these outputs concurrently: 您可以将LSTM输出提供给许多不同的层(通常是神经函数),从而导致不同的输出,然后在这些输出的每一个上同时训练模型:

from keras.models import Model
from keras.layers import Input, Dense, LSTM

# function definitions
lstm_function = LSTM(..args)
pitch_function = Dense(num_pitches, activation='softmax')
duration_function = Dense(num_durations, activation='softmax')
input_features = Input(input_dimensionality)

# function applications
lstm_output = lstm_function(input_features)
pitches = pitch_function(lstm_output)
durations = duration_function(lstm_output)

# model 
model = Model(inputs=[input_features], outputs=[pitches, durations])
model.compile(loss=['categorical_crossentropy', 'mse'], optimizer='RMSProp')

This may be generalized to arbitrary information flows, with as many layers/outputs as you need. 可以将其概括为任意信息流,并根据需要提供任意数量的层/输出。 Remember that for each output you need to define a corresponding loss (or None ). 请记住,对于每个输出,您需要定义一个相应的损耗(或None )。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM