下采样 wav 音频文件

Question

我必须在不使用任何外部 Python 库的情况下将 wav 文件从 44100Hz 下采样到 16000Hz，因此最好使用wave和/或audioop 。 我尝试使用setframerate function 将 wav 文件帧率更改为 16000，但这只会减慢整个录制速度。 我怎样才能将音频文件下采样到 16kHz 并保持相同的音频长度？

Answer 1

您可以使用 Librosa 的 load() 函数，

import librosa    
y, s = librosa.load('test.wav', sr=8000) # Downsample 44.1kHz to 8kHz

安装 Librosa 的额外努力可能值得高枕无忧。

专业提示：在 Anaconda 上安装 Librosa 时，您还需要安装 ffmpeg ，所以

pip install librosa
conda install -c conda-forge ffmpeg

这为您节省了 NoBackendError() 错误。

Answer 2

要对信号进行下采样（也称为抽取）（这意味着降低采样率）或上采样（增加采样率），您需要在数据之间进行插值。

这个想法是您需要以某种方式在您的点之间绘制一条曲线，然后以新的采样率从这条曲线中获取值。 这是因为您想知道未采样的某个时间的声波值，因此您必须通过一种或另一种方式来猜测该值。 子采样很容易的唯一情况是将采样率除以整数 $k$。 在这种情况下，您只需要获取 $k$ 样本桶并仅保留第一个。 但这不会回答你的问题。 请参见下图，其中您有一条以两种不同比例采样的曲线。

如果您了解原理，您可以手动完成，但我强烈建议您使用库。 原因是插入正确的方式并不容易，也不明显。

您可以使用线性插值（用一条线连接点）或二项式插值（用一个多项式连接三个点）或（有时最适合声音）使用傅立叶变换并在频率空间内插值。 由于傅立叶变换不是您想手动重写的东西，如果您想要一个好的下采样/上采样，请参阅下图，了解使用与 scipy 不同的算法的上采样的两条曲线。 “重采样”函数使用傅立叶变换。

我确实是在加载 44100Hz 波形文件并需要 48000Hz 采样数据的情况下，所以我写了以下几行来加载我的数据：

    # Imports
    from scipy.io import wavfile
    import scipy.signal as sps

    # Your new sampling rate
    new_rate = 48000

    # Read file
    sampling_rate, data = wavfile.read(path)

    # Resample data
    number_of_samples = round(len(data) * float(new_rate) / sampling_rate)
    data = sps.resample(data, number_of_samples)

请注意，如果您只进行下采样并且想要比傅立叶更快的东西，您也可以使用抽取方法。

Answer 3

谢谢大家的答案。 我已经找到了一个解决方案，而且效果很好。 这是整个功能。

def downsampleWav(src, dst, inrate=44100, outrate=16000, inchannels=2, outchannels=1):
    if not os.path.exists(src):
        print 'Source not found!'
        return False

    if not os.path.exists(os.path.dirname(dst)):
        os.makedirs(os.path.dirname(dst))

    try:
        s_read = wave.open(src, 'r')
        s_write = wave.open(dst, 'w')
    except:
        print 'Failed to open files!'
        return False

    n_frames = s_read.getnframes()
    data = s_read.readframes(n_frames)

    try:
        converted = audioop.ratecv(data, 2, inchannels, inrate, outrate, None)
        if outchannels == 1:
            converted = audioop.tomono(converted[0], 2, 1, 0)
    except:
        print 'Failed to downsample wav'
        return False

    try:
        s_write.setparams((outchannels, 2, outrate, 0, 'NONE', 'Uncompressed'))
        s_write.writeframes(converted)
    except:
        print 'Failed to write wav'
        return False

    try:
        s_read.close()
        s_write.close()
    except:
        print 'Failed to close wav files'
        return False

    return True

Answer 4

我尝试使用 Librosa 但由于某些原因，即使在给出了y, s = librosa.load('test.wav', sr=16000)和librosa.output.write_wav(filename, y, sr) ，声音文件也不是以给定的采样率（16000，从 44kHz 下采样）保存。 但是pydub运行良好。 jiaaro 的一个很棒的库，我使用了以下命令：

from pydub import AudioSegment as am
sound = am.from_file(filepath, format='wav', frame_rate=22050)
sound = sound.set_frame_rate(16000)
sound.export(filepath, format='wav')

上面的代码表明，我以 22050 帧速率读取的文件更改为 16000 速率，并且export函数使用新的帧速率用此文件覆盖现有文件。 它比 librosa 效果更好，但我正在寻找比较两个包之间速度的方法，但由于数据很少，所以还没有弄清楚！！！

参考： https : //github.com/jiaaro/pydub/issues/232

Answer 5

您可以在scipy使用scipy 。 这样做有点头疼，因为在 python 的原生bytestring scipy和scipy所需的数组之间需要进行一些类型转换。 还有一个令人头疼的问题，因为在 Python 的 wave 模块中，无法判断数据是否已签名（仅当它是 8 位或 16 位时）。 它可能（应该）对两者都有效，但我还没有测试过。

这是一个将（无符号的）8 位和 16 位单声道从 44.1 转换为 16 的小程序。如果您有立体声，或使用其他格式，适应起来应该不难。 在代码开头编辑输入/输出名称。 从来没有使用过命令行参数。

#!/usr/bin/env python
# -*- coding: utf-8 -*-
#
#  downsample.py
#  
#  Copyright 2015 John Coppens <john@jcoppens.com>
#  
#  This program is free software; you can redistribute it and/or modify
#  it under the terms of the GNU General Public License as published by
#  the Free Software Foundation; either version 2 of the License, or
#  (at your option) any later version.
#  
#  This program is distributed in the hope that it will be useful,
#  but WITHOUT ANY WARRANTY; without even the implied warranty of
#  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
#  GNU General Public License for more details.
#  
#  You should have received a copy of the GNU General Public License
#  along with this program; if not, write to the Free Software
#  Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
#  MA 02110-1301, USA.
#  
#

inwave = "sine_44k.wav"
outwave = "sine_16k.wav"

import wave
import numpy as np
import scipy.signal as sps

class DownSample():
    def __init__(self):
        self.in_rate = 44100.0
        self.out_rate = 16000.0

    def open_file(self, fname):
        try:
            self.in_wav = wave.open(fname)
        except:
            print("Cannot open wav file (%s)" % fname)
            return False

        if self.in_wav.getframerate() != self.in_rate:
            print("Frame rate is not %d (it's %d)" % \
                  (self.in_rate, self.in_wav.getframerate()))
            return False

        self.in_nframes = self.in_wav.getnframes()
        print("Frames: %d" % self.in_wav.getnframes())

        if self.in_wav.getsampwidth() == 1:
            self.nptype = np.uint8
        elif self.in_wav.getsampwidth() == 2:
            self.nptype = np.uint16

        return True

    def resample(self, fname):
        self.out_wav = wave.open(fname, "w")
        self.out_wav.setframerate(self.out_rate)
        self.out_wav.setnchannels(self.in_wav.getnchannels())
        self.out_wav.setsampwidth (self.in_wav.getsampwidth())
        self.out_wav.setnframes(1)

        print("Nr output channels: %d" % self.out_wav.getnchannels())

        audio = self.in_wav.readframes(self.in_nframes)
        nroutsamples = round(len(audio) * self.out_rate/self.in_rate)
        print("Nr output samples: %d" %  nroutsamples)

        audio_out = sps.resample(np.fromstring(audio, self.nptype), nroutsamples)
        audio_out = audio_out.astype(self.nptype)

        self.out_wav.writeframes(audio_out.copy(order='C'))

        self.out_wav.close()

def main():
    ds = DownSample()
    if not ds.open_file(inwave): return 1
    ds.resample(outwave)
    return 0

if __name__ == '__main__':
    main()

Answer 6

如果当前采样率为16000，而您想要8000，则只需调用downsample('file.wav', 2)

from pathlib import Path
from scipy.io import wavfile

def downsample(file_name, factor):
  fs, data = wavfile.read(file_name)
  new_fs = fs // factor
  index = range(0, data.size, factor)
  new_data = data[index]
  new_file_name = Path(file_name).stem + '_downsampled.wav'
  wavfile.write(new_file_name, new_fs, new_data)

Answer 7

首先，你需要导入'librosa'库使用'librosa.load'重新采样音频文件librosa.load(path,sr) initiallly sr(sampling rate) = 22050.如果你想保留原生采样率make sr=None . 否则音频将被重新采样到提供的采样率

Answer 8

您可以在 Windows、macOS 或 Linux 上使用ffmpeg工具执行此操作。从此官方链接 ( https://ffmpeg.org/download.html ) 下载ffmpeg 。 我下载gyan.dev版本。 对于 Windows，请按照给定的步骤操作：

提取下载的文件
将文件夹重命名为ffmpeg
剪切此文件夹并将其粘贴到操作系统驱动器中。 通常为C驱动器
移动到ffmpeg.exe所在的bin文件夹
单击地址栏并复制路径，对我来说，它是C:\ffmpeg\bin
在开始菜单中输入env打开环境变量
在“ Advanced ”选项卡下，单击“ Environment Variables ”按钮
在User variables select Path下，单击Edit
单击“ New按钮并将复制的路径paste到字段中
每window点击OK
现在打开CMD并输入ffmpeg -version以确认您是否已正确添加环境变量的路径。 如果是，您将看到有关ffmpeg的信息，否则会出现错误。

现在，我们已准备好对音频进行重新采样。 现在在您的python文件中添加以下代码。

import os

source_file = "path/to/input/file/with/extension"    # "source_file.wav"
output_file = "path/to/output/file/with/extension"   # "compressed_output_file.wav"

output_str = f"ffmpeg -i {source_file} -ac 1 -ar 16000 {output_file}"
os.system(output_str)
print(output_str)

在我的许多项目中，我都使用此代码对wav和mp3文件进行上采样和下采样。

注意：上采样会增加file size ，而下采样会减小文件大小。

Answer 9

如果你使用 tensorflow 库，它是 example corvert 44100 sterio.mp3 -> 16000 mono.wav

!pip install tensorflow-io==0.25.0   # что сломалось с ==0.26.0 
import tensorflow_io as tfio
import tensorflow as tf
import numpy as np



srcFilePath = '/content/data/dataset_phoneme_in/she/pronunciation_en_she.mp3'
dstFilePath =  '/content/temp/1.wav'

#wavFensor=getAudioTensorFromFilePath(src)


rateOut=16000

audioIOTensor = tfio.audio.AudioIOTensor(srcFilePath)  #читает разный формат  работает на версии  !pip install tensorflow-io==0.25.0
print(audioIOTensor.shape)
chanalsIn=(int)(audioIOTensor.shape[1])
rateIn=(int)(audioIOTensor.rate)
print(audioIOTensor.shape[1])
audioTensor = audioIOTensor[0:] #get audio block   получить звуковый блок

if (chanalsIn>1): #sterio to mono
  audioTensor=audioTensor.numpy()
  audioTensor=np.average(audioTensor,axis=1)
  audio_slice=tf.convert_to_tensor(audioTensor)

print(audioTensor.shape)

#change rate
audioTensor=tfio.audio.resample(audioTensor, rateIn,rateOut)

print(audioTensor.shape)


# remove last dimension
#audioTensor = tf.squeeze(audioTensor, axis=[1])
# convert to wav and save 
#wav = tf.cast(audioTensor, tf.float32) / 32768.0
print(audioTensor.shape)
audioTensor=tf.expand_dims(audioTensor, axis=1)  # add axis for tf.audio.encode_wav
print(audioTensor.shape)
outWavAudio=tf.audio.encode_wav(audio=audioTensor,sample_rate=rateOut)
    
tf.io.write_file(dst,outWavAudio)

下采样 wav 音频文件

问题描述

8 个解决方案

解决方案1
37 2018-03-18 12:31:13

解决方案2
14 2019-03-17 16:50:46

解决方案3
12 2015-06-05 07:29:46

解决方案4
6 2020-02-24 07:29:35

解决方案5
4 2015-06-03 19:09:34

解决方案6
0 2019-10-26 04:44:05

解决方案7
0 2020-12-16 10:19:50

解决方案8
0 2022-09-29 07:13:58

解决方案9
0 2022-10-08 16:47:49

下采样 wav 音频文件

问题描述

8 个解决方案

解决方案1 37 2018-03-18 12:31:13

解决方案2 14 2019-03-17 16:50:46

解决方案3 12 2015-06-05 07:29:46

解决方案4 6 2020-02-24 07:29:35

解决方案5 4 2015-06-03 19:09:34

解决方案6 0 2019-10-26 04:44:05

解决方案7 0 2020-12-16 10:19:50

解决方案8 0 2022-09-29 07:13:58

解决方案9 0 2022-10-08 16:47:49

解决方案1
37 2018-03-18 12:31:13

解决方案2
14 2019-03-17 16:50:46

解决方案3
12 2015-06-05 07:29:46

解决方案4
6 2020-02-24 07:29:35

解决方案5
4 2015-06-03 19:09:34

解决方案6
0 2019-10-26 04:44:05

解决方案7
0 2020-12-16 10:19:50

解决方案8
0 2022-09-29 07:13:58

解决方案9
0 2022-10-08 16:47:49