过滤TensorFlow中的音频信号

Question

I am building an audio-based deep learning model. As part of the preporcessing I want to augment the audio in my datasets.我正在构建一个基于音频的深度学习 model。作为预处理的一部分，我想增强数据集中的音频。 One augmentation that I want to do is to apply RIR (room impulse response) function. I am working with Python 3.9.5 and TensorFlow 2.8 .我想做的一项增强是应用 RIR（房间脉冲响应）function。我正在使用Python 3.9.5和TensorFlow 2.8 。

In Python the standard way to do it is, if the RIR is given as a finite impulse response (FIR) of n taps, is using SciPy lfilter在 Python 中，如果 RIR 作为n 个抽头的有限脉冲响应 (FIR) 给出，则标准方法是使用SciPy lfilter

import numpy as np
from scipy import signal
import soundfile as sf

h = np.load("rir.npy")
x, fs = sf.read("audio.wav")

y = signal.lfilter(h, 1, x)

Running in loop on all the files may take a long time.在所有文件上循环运行可能需要很长时间。 Doing it with TensorFlow map utility on TensorFlow datasets:使用 TensorFlow map实用程序在 TensorFlow 数据集上执行此操作：

# define filter function
def h_filt(audio, label):
    h = np.load("rir.npy")
    x = audio.numpy()
    y = signal.lfilter(h, 1, x)
    return tf.convert_to_tensor(y, dtype=tf.float32), label

# apply it via TF map on dataset
aug_ds = ds.map(h_filt)

Using tf.numpy_function :使用tf.numpy_function ：

tf_h_filt = tf.numpy_function(h_filt, [audio, label], [tf.float32, tf.string])

# apply it via TF map on dataset
aug_ds = ds.map(tf_h_filt)

I have two questions:我有两个问题：

Is this way correct and fast enough (less than a minute for 50,000 files)?这种方式是否正确且足够快（50,000 个文件不到一分钟）？
Is there a faster way to do it?有更快的方法吗？ Eg replace the SciPy function with a built-in TensforFlow function. I didn't find the equivalent of lfilter or SciPy's convolve .例如，用内置的 TensforFlow function 替换 SciPy function。我没有找到lfilter或SciPy 的convolve 的等价物。

Answer 1

Here is one way you could do这是你可以做的一种方法

Notice that tensor flow function is designed to receive batches of inputs with multiple channels, and the filter can have multiple input channels and multiple output channels.请注意，张量流 function 旨在接收具有多个通道的批次输入，过滤器可以具有多个输入通道和多个 output 通道。 Let N be the size of the batch I , the number of input channels, F the filter width, L the input width and O the number of output channels.令N为批次I的大小、输入通道数、 F为滤波器宽度、 L为输入宽度以及O为 output 个通道的数量。 Using padding='SAME' it maps an input of shape (N, L, I) and a filter of shape (F, I, O) to an output of shape (N, L, O) .使用padding='SAME'它将形状为(N, L, I)的输入和形状为(F, I, O)的过滤器映射到形状为(N, L, O) O) 的 output。

import numpy as np
from scipy import signal
import tensorflow as tf

# data to compare the two approaches
x = np.random.randn(100)
h = np.random.randn(11)

# h
y_lfilt = signal.lfilter(h, 1, x)

# Since the denominator of your filter transfer function is 1
# the output of lfiler matches the convolution
y_np = np.convolve(h, x)
assert np.allclose(y_lfilt, y_np[:len(y_lfilt)])

# now let's do the convolution using tensorflow
y_tf = tf.nn.conv1d(
    # x must be padded with half of the size of h
    # to use padding 'SAME'
    np.pad(x, len(h) // 2).reshape(1, -1, 1), 
    # the time axis of h must be flipped
    h[::-1].reshape(-1, 1, 1), # a 1x1 matrix of filters
    stride=1, 
    padding='SAME', 
    data_format='NWC')

assert np.allclose(y_lfilt, np.squeeze(y_tf)[:len(y_lfilt)])

过滤TensorFlow中的音频信号

问题描述

1 个解决方案

解决方案1
3 已采纳 2022-04-12 06:27:25

过滤TensorFlow中的音频信号

问题描述

1 个解决方案

解决方案1 3 已采纳 2022-04-12 06:27:25

解决方案1
3 已采纳 2022-04-12 06:27:25