简体   繁体   English

python 中的音频数据聚合

[英]Audio Data Agmentation in python

I am using below function to augment audio data generated from wav audio files.我在下面使用 function 来增强从 wav 音频文件生成的音频数据。

def generate_augmented_data(file_path):
augmented_data = []
samples = load_wav(file_path,get_duration=False)
for time_value in [0.7, 1, 1.3]:
    for pitch_value in [-1, 0, 1]:
        time_stretch_data = librosa.effects.time_stretch(samples, rate=time_value)
        final_data = librosa.effects.pitch_shift(time_stretch_data, sr=sample_rate, n_steps=pitch_value)
        augmented_data.append(final_data)
return augmented_data

I also need to augment the class labels and facing difficulties with it.我还需要增加 class 标签并面临困难。 Tried below cod, but its not getting me the expected result在鳕鱼下面尝试过,但它没有让我得到预期的结果

## generating augmented data. 
def generate_augmented_data_label(file_path, label):
augmented_data = []
augmented_label = []
samples = load_wav(file_path,get_duration=False)
for time_value in [0.7, 1, 1.3]:
    for pitch_value in [-1, 0, 1]:
        time_stretch_data = librosa.effects.time_stretch(samples, rate=time_value)
        final_data = librosa.effects.pitch_shift(time_stretch_data, sr=sample_rate, n_steps=pitch_value)
        augmented_data.append(final_data)
        augmented_label.append(label)
return augmented_data,augmented_label

Before augmentation shape for data and labels are as below,在数据和标签的增强形状如下所示之前,

X_train.reset_index(inplace=True, drop=True)
y_train.reset_index(inplace=True, drop=True)
X_train_augmented_data = []
y_train_augmented_data = []
for i in range(len(X_train)):
#print(i)
   t1 = X_train.iloc[i]
   t2 = y_train[i]
   tmp1,tmp2 = generate_augmented_data_label(t1,t2)
#print(tmp1,tmp2)
   X_train_augmented_data.append(tmp1)
   y_train_augmented_data.append(tmp2)

len(X_train)
1600
len(y_train)
1600
print(len(X_train_augmented_data))
print(len(y_train_augmented_data))

After data augmentation and an additional masking step, shape is coming as在数据增强和额外的掩蔽步骤之后,形状如下

 augmented_train_data_mask = []
 for i in range(0,len(augmented_train_data_pad)):
   augmented_train_data_mask.append(list(map(bool,augmented_train_data_pad[i])))
   augmented_train_data_mask = np.array(augmented_train_data_mask)
 print(augmented_train_data_pad.shape)
 print(augmented_train_data_mask.shape)
 (14400, 17640)
 (14400, 17640)

However, label len is still 1600. Later when I pass these into an LSTM model, I am getting a shape mismatch error.但是,label len 仍然是 1600。后来当我将这些传递到 LSTM model 时,我收到了形状不匹配错误。

ValueError: Data cardinality is ambiguous:
x sizes: 14400, 14400
y sizes: 1600
Make sure all arrays contain the same number of samples.

Looking for some help to resolve this issue.寻找一些帮助来解决这个问题。

You may refer link for reference:您可以参考链接:

# https://www.geeksforgeeks.org/python-add-similar-value-multiple-times-in-list/ # https://www.geeksforgeeks.org/python-add-similar-value-multiple-times-in-list/


type(y_train)= panda series type(y_train)=熊猫系列

from itertools import repeat从 itertools 导入重复

new_label=[]新标签=[]

for index, value in y_train.items(): new_label.extend(repeat(value, 2))对于索引,y_train.items() 中的值:new_label.extend(repeat(value, 2))

len(new_label)长度(新标签)

You can use numpy repeat function to replicate your numpy array.您可以使用 numpy 重复 function 来复制您的 numpy 阵列。

ex: In: arr = np.arange(3) out: array([0, 1, 2])例如:输入:arr = np.arange(3) 输出:array([0, 1, 2])

In: arr.repeat(3) Out: array([0, 0, 0, 1, 1, 1, 2, 2, 2])输入:arr.repeat(3) 输出:array([0, 0, 0, 1, 1, 1, 2, 2, 2])

Hope this will suffice your requirement.希望这能满足您的要求。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM