简体   繁体   English

如何在音轨中找到无声部分

[英]How to find silent parts in audio track

I have following code that stores raw audio data from wav file in a byte buffer:我有以下代码将来自 wav 文件的原始音频数据存储在字节缓冲区中:

BYTE header[74];
fread(&header, sizeof(BYTE), 74, inputFile);
BYTE * sound_buffer;
DWORD data_size;

fread(&data_size, sizeof(DWORD), 1, inputFile);
sound_buffer = (BYTE *)malloc(sizeof(BYTE) * data_size);
fread(sound_buffer, sizeof(BYTE), data_size, inputFile);

Is there any algorithm to determine when the audio track is silent (literally no sound) and when there is some sound level?是否有任何算法可以确定音轨何时无声(实际上没有声音)以及何时有一定的声级?

Well, your "sound" will be an array of values, whether integer or real - depends on your format.好吧,您的“声音”将是一个值数组,无论是整数还是实数 - 取决于您的格式。

For the file to be silent or "have no sound" the values in that array will have to be zero, or very close to zero, or worst case scenario - if the audio has bias - the value will stay the same instead of fluctuating around to produce sound waves.为了使文件无声或“没有声音”,该数组中的值必须为零,或非常接近于零,或者最坏的情况 - 如果音频有偏差 - 该值将保持不变而不是左右波动来产生声波。

You can write a simple function which returns the delta for a range, in other words the difference between the largest and smallest value, the lower the delta the lower the sound volume.您可以编写一个简单的函数来返回一个范围的 delta,换句话说,最大和最小值之间的差异,delta 越小,音量越低。

Or alternatively, you can write a function that returns you the ranges in which the delta is lower than a given threshold.或者,您可以编写一个函数,返回增量低于给定阈值的范围。

For the sake of toying, I wrote a nifty class:为了玩弄,我写了一个漂亮的类:

template<typename T>
class SilenceFinder {
public:
  SilenceFinder(T * data, uint size, uint samples) : sBegin(0), d(data), s(size), samp(samples), status(Undefined) {}

  std::vector<std::pair<uint, uint>> find(const T threshold, const uint window) {
    auto r = findSilence(d, s, threshold, window);
    regionsToTime(r);
    return r;
  }

private:
  enum Status {
    Silent, Loud, Undefined
  };

  void toggleSilence(Status st, uint pos, std::vector<std::pair<uint, uint>> & res) {
    if (st == Silent) {
        if (status != Silent) sBegin = pos;
        status = Silent;
      }
    else {
        if (status == Silent) res.push_back(std::pair<uint, uint>(sBegin, pos));
        status = Loud;
      }
  }

  void end(Status st, uint pos, std::vector<std::pair<uint, uint>> & res) {
    if ((status == Silent) && (st == Silent)) res.push_back(std::pair<uint, uint>(sBegin, pos));
  }

  static T delta(T * data, const uint window) {
    T min = std::numeric_limits<T>::max(), max = std::numeric_limits<T>::min();
    for (uint i = 0; i < window; ++i) {
        T c = data[i];
        if (c < min) min = c;
        if (c > max) max = c;
      }
    return max - min;
  }

  std::vector<std::pair<uint, uint>> findSilence(T * data, const uint size, const T threshold, const uint win) {
    std::vector<std::pair<uint, uint>> regions;
    uint window = win;
    uint pos = 0;
    Status s = Undefined;
    while ((pos + window) <= size) {
        if (delta(data + pos, window) < threshold) s = Silent;
        else s = Loud;
        toggleSilence(s, pos, regions);
        pos += window;
      }
    if (delta(data + pos, size - pos) < threshold) s = Silent;
    else s = Loud;
    end(s, pos, regions);
    return regions;
  }

  void regionsToTime(std::vector<std::pair<uint, uint>> & regions) {
    for (auto & r : regions) {
        r.first /= samp;
        r.second /= samp;
      }
  }

  T * d;
  uint sBegin, s, samp;
  Status status;
};

I haven't really tested it but it looks like it should work.我还没有真正测试过它,但看起来它应该可以工作。 However, it assumes a single audio channel, you will have to extend it in order to work with and across multichannel audio.但是,它假定一个音频通道,您必须扩展它才能使用和跨多通道音频。 Here is how you use it:以下是您如何使用它:

SilenceFinder<audioDataType> finder(audioDataPtr, sizeOfData, sampleRate);
auto res = finder.find(threshold, scanWindow);
// and output the silent regions
for (auto r : res) std::cout << r.first << " " << r.second << std::endl;

Also notice that the way it is implemented right now, the "cut" to silent regions will be very abrupt, such "noise gate" type of filers usually come with attack and release parameters, which smooth out the result.还要注意它现在的实现方式,对静音区域的“剪切”将非常突然,这种“噪声门”类型的滤波器通常带有攻击和释放参数,可以平滑结果。 For example there might be 5 seconds of silence with just a tiny pop in the middle, without attack and release parameters, you will get the 5 minutes split in two, and the pop will actually remain, but using those you can implement varying sensitivity to when to cut it off.例如,可能有 5 秒的静音,中间只有一点点弹出,没有起音和释放参数,你将把 5 分钟分成两部分,而弹出实际上会保留,但使用那些你可以实现不同的灵敏度什么时候剪掉。

To check if the portion of the track between t1 and t2 is 'silent', compute the root mean square (RMS) of the samples between t1 and t2.要检查 t1 和 t2 之间的轨道部分是否“无声”,请计算 t1 和 t2 之间样本的均方根 (RMS)。 Then, just check if the RMS is <= to some threshold value that you determine constitutes 'silence'.然后,只需检查 RMS 是否<=到您确定构成“沉默”的某个阈值。 See http://en.wikipedia.org/wiki/Root_mean_squarehttp://en.wikipedia.org/wiki/Root_mean_square

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM