简体   繁体   English

使用python从给定的音频文件中检测静音索引

[英]detecting the index of silence from a given audio file using python

I am trying to process an audio file in python using various modules like numpy, struct etc. But I am really having a hard time detecting silence in the file, as in where is the presence of silence.我正在尝试使用 numpy、struct 等各种模块在 python 中处理音频文件。但我真的很难检测文件中的静音,例如静音的位置。 one on the methods I came across was to slide a window of fixed time interval over my audio signal and record the sum of squared elements.我遇到的方法之一是在我的音频信号上滑动一个固定时间间隔的窗口并记录平方元素的总和。 I am new to python and hardly aware of it thus unable to implement this method.我是 python 的新手,几乎不知道它,因此无法实现这个方法。

If you are open to outside libraries, one of the quick way to do is using pydub .如果您对外部图书馆开放,一种快速的方法是使用pydub
pydub has a module called silence that has methods detect_silence and detect_nonsilent that may be useful in your case. pydub有一个叫做Silence的模块,它有方法detect_silencedetect_nonsilent可能对你的情况有用。
However, the only caviar is that silence needs to be at-least half a second.然而,唯一的鱼子酱是沉默至少需要半秒钟。

Below is a sample implementation that I tried using an audio file.下面是我尝试使用音频文件的示例实现。 However, since silence in my case was less than half a second, only few of the silent ranges were correct.但是,由于我的情况下的静音不到半秒,因此只有少数静音范围是正确的。

You may want to try this and see if it works for you by tweaking min_silence_len and silence_thresh你可能想试试这个,看看它是否适合你,通过调整min_silence_lensilence_thresh

Program程序

from pydub import AudioSegment,silence


myaudio = intro = AudioSegment.from_wav("a-z-vowels.wav")

silence = silence.detect_silence(myaudio, min_silence_len=1000, silence_thresh=-16)

silence = [((start/1000),(stop/1000)) for start,stop in silence] #convert to sec
print silence

Result结果

Python 2.7.9 (default, Dec 10 2014, 12:24:55) [MSC v.1500 32 bit (Intel)] on win32 Type "copyright", "credits" or "license()" for more information. Python 2.7.9(默认,2014 年 12 月 10 日,12:24:55)[MSC v.1500 32 位(英特尔)] on win32 输入“copyright”、“credits”或“license()”以获取更多信息。

================================ RESTART ================================ ================================ 重新开始 ================== ================

[(0, 1), (1, 14), (14, 20), (19, 26), (26, 27), (28, 30), (29, 32), (32, 34), (33, 37), (37, 41), (42, 46), (46, 47), (48, 52)] [(0, 1), (1, 14), (14, 20), (19, 26), (26, 27), (28, 30), (29, 32), (32, 34), ( 33, 37), (37, 41), (42, 46), (46, 47), (48, 52)]

For better result use dBFS为了获得更好的结果,请使用 dBFS

from pydub import AudioSegment,silence

myaudio = intro = AudioSegment.from_mp3("RelativityOverview.mp3")
dBFS=myaudio.dBFS
silence = silence.detect_silence(myaudio, min_silence_len=1000, silence_thresh=dBFS-16)

silence = [((start/1000),(stop/1000)) for start,stop in silence] #in sec
print(silence)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM