简体   繁体   English

解释.WAV文件[Python]

[英]Interpreting a .WAV file [Python]

I'm attempting to process an audio file in python and apply a Low Pass filter to remove some background noise. 我正在尝试在python中处理音频文件并应用低通滤波器来消除一些背景噪音。 Currently I am capable of successfully loading the file and generating an array with its data values: 目前我能够成功加载文件并生成一个包含其数据值的数组:

class AudioModule:

    def __init__(self, fname=""):
      self.stream = wave.open(fname, 'r')
      self.frames = [] 

    def build(self):
      self.stream.rewind()
      for x in range(self.stream.getnframes()):
           self.frames.append(struct.unpack('B',self.stream.readframes(1)))  

I used struct.unpack('B'..) for this particular file. 我在这个特定的文件中使用了struct.unpack('B'..)。 The audio file being loaded outputs the following specifications: 正在加载的音频文件输出以下规格:

nchannels: 1
sampwidth: 1
framerate: 6000

I know that sampwidth specifies the width in bytes returned by each readframes(1) call. 我知道sampwidth指定每个readframes(1)调用返回的宽度(以字节为单位)。 Upon loading the array it contains values as shown (ranging from 128 to 180 throughout): 加载数组后,它包含如图所示的值(范围从128到180):

>>> r.frames[6000:6025]
[(127,), (127,), (127,), (127,), (128,), (128,), (128,), (128,), (128,), (128,),      (128,), (128,), (128,), (128,), (128,), (128,), (128,), (128,), (128,), (128,), (128,), (128,), (128,), (128,), (128,)]

Question: What do those numbers represent? 问题:这些数字代表什么? Other audio files with larger sample-width give completely different numbers. 其他具有较大样本宽度的音频文件会给出完全不同的数字。 My goal is to trim out certain frequencies from the audio file, unfortunately I know very little about this and am unaware as to how these values relate to frequency. 我的目标是从音频文件中删除某些频率,遗憾的是我对此知之甚少,并且不知道这些值如何与频率相关。

What is the best ways to remove all values above a certain frequency threshold? 删除高于特定频率阈值的所有值的最佳方法是什么?

Additionally the values are being packed back to a different file as follows: 此外,值将被打包回不同的文件,如下所示:

def store(self, fout=""):
      out = wave.open(fout, 'w')
      nchannels = self.stream.getnchannels()
      sampwidth = self.stream.getsampwidth()
      framerate = self.stream.getframerate()
      nframes = len(self.frames)
      comptype = "NONE"
      compname = "not compressed"

      out.setparams((nchannels, sampwidth, framerate, nframes,
          comptype, compname))

      if nchannels == 1:
           for f in self.frames:
                data = struct.pack('B', f[0])
                out.writeframes(data)
      elif nchannels == 2:
           for f in self.frames:
                data = struct.pack('BB', f[0], f[1])
                out.writeframes(data)
      out.close()     

I think the numbers are abstract of the extends of the vibration of the membrane or the volume. 我认为这些数字是膜振动或体积延伸的抽象概念。 Higher value means a large vibration of the membrane. 值越高意味着膜的振动越大。 You can read more here . 你可以在这里阅读更多。

And the sample width is the range of volume. 样本宽度是体积范围。 With different types of sampling, the sample width is different. 对于不同类型的采样,样本宽度是不同的。 For example, if the sample width is 1 bit, so we can only describe the audio as having sound or not. 例如,如果样本宽度为1位,那么我们只能将音频描述为有声音。 So, usually higher sample width, the audio is of higher quality. 因此,通常更高的样本宽度,音频质量更高。 For more about sample width, you can read Sample Rate and Bitrate: The Guts of Digital Audio . 有关样本宽度的更多信息,您可以阅读采样率和比特率:数字音频的内容

And the singnals stored in the audio file is in the time domain. 存储在音频文件中的信号在时域中。 It doesn't represent frequency. 它不代表频率。 If you want to get the values in frequency domain, you can perform an FFT on the array you get. 如果要获取频域中的值,可以对所获得的阵列执行FFT

I recommend using numpy to do audio perform. 我建议使用numpy来做音频表演。 For example, to get the array you want, you just need to use np.fromstring . 例如,要获得所需的数组,只需使用np.fromstring And the related functions such as FFT have already been defined. 并且已经定义了诸如FFT的相关功能。 Many samples and papers can be found on Google. 在Google上可以找到许多样本和论文。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM