简体   繁体   English

在python中组合两个不同维度的数组

[英]Combine two arrays with different dimensions in python

I am working on a project of classification of emotions using audio and text.我正在使用音频和文本进行情绪分类的项目。 I passed the audio and text to 1D CNN and got the following output arrays:我将音频和文本传递给 1D CNN 并得到以下输出数组:

audio_features_shape = (396, 63, 64)
text_features_shape = (52, 1, 64)

Now I want to stack these two different dimensions arrays into one so I can pass one array to LSTM.现在我想将这两个不同维度的数组堆叠成一个,这样我就可以将一个数组传递给 LSTM。 I want the shape as:我想要的形状为:

expected_array_shape = (448, 64, 128)

I tried the following methods but no one is giving the output I want.我尝试了以下方法,但没有人给出我想要的输出。

x = np.column_stack((audio_features, text_features))
x = np.concatenate((audio_features,text_features), axis=2)
x = np.append(audio_features, text_features)
x = np.transpose([np.tile(audio_features, len(text_features)), np.repeat(text_features, len(audio_features))])
x = np.array([np.append(text_features,x) for x in audio_features])

Any help would be appreciated.任何帮助,将不胜感激。 Thanks!谢谢!

How are the values of the 2 arrays supposed to be distributed in the result? 2 个数组的值应该如何分布在结果中?

audio_features_shape = (396, 63, 64)
text_features_shape = (52, 1, 64)

text_features should be "expanded" to (52,63,64), either by repeating values 63 times on the middle axis, or putting this array into a target array of 0s. text_features应该“扩展”到 (52,63,64),或者通过在中间轴上重复值 63 次,或者将此数组放入 0 的目标数组中。 In either case it will be 63 times larger.在任何一种情况下,它都会大 63 倍。

Once the arrays match on all but the first dimension they can be concatenated.一旦数组在除第一个维度之外的所有维度上都匹配后,它们就可以被连接起来。

But the real question is, what makes sense in the LSTM use?但真正的问题是,LSTM 的使用有何意义?

Depending on what exactly you want and whether you are only interested in using Tensorflow, you could give the following a try:根据您到底想要什么以及您是否只对使用 Tensorflow 感兴趣,您可以尝试以下操作:

import tensorflow as tf

audio_features = tf.random.normal((396, 63, 64))
text_features = tf.random.normal((52, 1, 64))

text_features = tf.repeat(text_features, repeats=(audio_features.shape[1]-text_features.shape[1]) + 1, axis=1) 
repeat_features = tf.concat([audio_features, text_features], axis=0)
text_features = tf.random.normal((52, 1, 64))

paddings = tf.constant([[0, 0], [0, audio_features.shape[1]-text_features.shape[1]], [0, 0]])
pad_features = tf.concat([audio_features, tf.pad(text_features, paddings, "CONSTANT")], axis=0)

print('Using tf.repeat --> ', audio_features.shape, text_features.shape, repeat_features.shape)
print('Using tf.pad --> ', audio_features.shape, text_features.shape, pad_features.shape)
Using tf.repeat -->  (396, 63, 64) (52, 1, 64) (448, 63, 64)
Using tf.pad -->  (396, 63, 64) (52, 1, 64) (448, 63, 64)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM