简体   繁体   English

在python中从TIMIT数据库读取WAV文件

[英]reading a WAV file from TIMIT database in python

I'm trying to read a wav file from the TIMIT database in python but I get an error:我正在尝试从 python 中的 TIMIT 数据库读取 wav 文件,但出现错误:

When I'm using wave:当我使用波时:

wave.Error: file does not start with RIFF id

When I'm using scipy:当我使用 scipy 时:

ValueError: File format b'NIST'... not understood.

and when I'm using librosa, the program got stuck.当我使用 librosa 时,程序卡住了。 I tried to convert it to wav using sox:我尝试使用 sox 将其转换为 wav:

cmd = "sox " + wav_file + " -t wav " + new_wav
subprocess.call(cmd, shell=True)

and it didn't help.它没有帮助。 I saw an old answer referencing to the package scikits.audiolab but it looks like it is no longer supported.我看到了一个引用包 scikits.audiolab 的旧答案,但它看起来不再受支持。

How can I read these file to get a ndarray of the data?如何读取这些文件以获取数据的 ndarray?

Thanks谢谢

Your file is not a WAV file.您的文件不是 WAV 文件。 Apparently it is a NIST SPHERE file.显然它是一个 NIST SPHERE 文件。 From the LDC web page : "Many LDC corpora contain speech files in NIST SPHERE format."来自LDC 网页“许多 LDC 语料库包含 NIST SPHERE 格式的语音文件。” According to the description of the NIST File Format , the first four characters of the file are NIST .根据NIST File Format的描述,文件的前四个字符是NIST That's what the scipy error is telling you: it doesn't know how to read a file that begins with NIST .这就是 scipy 错误告诉您的:它不知道如何读取以NIST开头的文件。

I suspect you'll have to convert the file to WAV if you want to read the file with any of the libraries that you tried.如果您想使用您尝试过的任何库读取文件,我怀疑您必须将文件转换为 WAV。 To force the conversion to WAV using the program sph2pipe , use the command option -f wav (or equivalently, -f rif ), eg要使用程序sph2pipe强制转换为 WAV,请使用命令选项-f wav (或等效的-f rif ),例如

sph2pipe -f wav input.sph output.wav

issue this from command line to verify its a wav file ... or not从命令行发出此命令以验证其 wav 文件...

xxd -b myaudiofile.wav | head

if its wav format it will appear something like如果它的 wav 格式,它会看起来像

00000000: 01010010 01001001 01000110 01000110 10111100 10101111  RIFF..
00000006: 00000001 00000000 01010111 01000001 01010110 01000101  ..WAVE
0000000c: 01100110 01101101 01110100 00100000 00010000 00000000  fmt ..
00000012: 00000000 00000000 00000001 00000000 00000001 00000000  ......
00000018: 01000000 00011111 00000000 00000000 01000000 00011111  @...@.
0000001e: 00000000 00000000 00000001 00000000 00001000 00000000  ......
00000024: 01100100 01100001 01110100 01100001 10011000 10101111  data..
0000002a: 00000001 00000000 10000001 10000000 10000001 10000000  ......
00000030: 10000001 10000000 10000001 10000000 10000001 10000000  ......
00000036: 10000001 10000000 10000001 10000000 10000001 10000000  ......

notice the wav file begins with the characters RIFF which is the mandatory indicator the file is using wav codec ... if your system (I'm on linux) does not have above command line utility : xxd then use any hex editor like wxHexEditor to similarily examine your wav file to confirm you see the RIFF ... if no RIFF then its simply not a wav file注意wav文件以字符RIFF这是文件使用强制性指标WAV编解码器......如果你的系统(我在Linux上)不具备上述命令行实用程序:XXD然后使用任何十六进制编辑器一样wxHexEditor到类似地检查您的 wav 文件以确认您看到 RIFF ...如果没有 RIFF 那么它根本不是 wav 文件

Here are details of wav format specs以下是 wav 格式规范的详细信息

http://soundfile.sapp.org/doc/WaveFormat/ http://soundfile.sapp.org/doc/WaveFormat/

http://www-mmsp.ece.mcgill.ca/Documents/AudioFormats/WAVE/WAVE.html http://www-mmsp.ece.mcgill.ca/Documents/AudioFormats/WAVE/WAVE.html

http://unusedino.de/ec64/technical/formats/wav.html http://unusedino.de/ec64/technical/formats/wav.html

http://www.drdobbs.com/database/inside-the-riff-specification/184409308 http://www.drdobbs.com/database/inside-the-riff-specification/184409308

https://www.gamedev.net/articles/programming/general-and-gameplay-programming/loading-a-wave-file-r709 https://www.gamedev.net/articles/programming/general-and-gameplay-programming/loading-a-wave-file-r709

http://www.topherlee.com/software/pcm-tut-wavformat.html http://www.topherlee.com/software/pcm-tut-wavformat.html

http://www.labbookpages.co.uk/audio/javaWavFiles.html http://www.labbookpages.co.uk/audio/javaWavFiles.html

http://www.johnloomis.org/cpe102/asgn/asgn1/riff.html http://www.johnloomis.org/cpe102/asgn/asgn1/riff.html

http://nagasm.org/ASL/sound05/ http://nagasm.org/ASL/sound05/

If you want a generic code that works for every wav file inside the folder run:如果您想要一个适用于文件夹中每个 wav 文件的通用代码,请运行:

forfiles /s /m *.wav /c "cmd /c sph2pipe -f wav @file @fnameRIFF.wav"

It search for every wav file that can find and create a wav file that both scipy and wave can read with the name < base_name >RIFF.wav它搜索每个可以找到的 wav 文件,并创建一个 scipy 和 wave 都可以读取的 wav 文件,名称为 < base_name >RIFF.wav

I have written a python script which will convert all the .WAV files in NIST format spoken by all speakers from all dialects to .wav files which ca n be played on your system.我编写了一个 python 脚本,它将所有方言的所有说话者所说的 NIST 格式的所有 .WAV 文件转换为可以在您的系统上播放的 .wav 文件。

Note: All the dialects folders are present in ./TIMIT/TRAIN/ .注意:所有方言文件夹都在 ./TIMIT/TRAIN/ 中。 You may have to change the dialects_path according to your project structure(or if you are on Windows)您可能需要根据您的项目结构(或者如果您在 Windows 上)更改 dialects_path

from sphfile import SPHFile

dialects_path = "./TIMIT/TRAIN/"

for dialect in dialects:
    dialect_path = dialects_path + dialect
    speakers = os.listdir(path = dialect_path)
    for speaker in speakers:
        speaker_path =  os.path.join(dialect_path,speaker)        
        speaker_recordings = os.listdir(path = speaker_path)

        wav_files = glob.glob(speaker_path + '/*.WAV')

        for wav_file in wav_files:
            sph = SPHFile(wav_file)
            txt_file = ""
            txt_file = wav_file[:-3] + "TXT"

            f = open(txt_file,'r')
            for line in f:
                words = line.split(" ")
                start_time = (int(words[0])/16000)
                end_time = (int(words[1])/16000)
            print("writing file ", wav_file)
            sph.write_wav(wav_file.replace(".WAV",".wav"),start_time,end_time)    

Please use sounddevice and soundfile to obtain the numpy array data (and playback) using the following code:请使用 sounddevice 和 soundfile 使用以下代码获取 numpy 数组数据(和播放):

import matplotlib.pyplot as plt
import soundfile as sf
import sounddevice as sd
# https://catalog.ldc.upenn.edu/desc/addenda/LDC93S1.wav
data, fs = sf.read('LDC93S1.wav')
print(data.shape,fs)
sd.play(data, fs, blocking=True)
plt.plot(data)
plt.show()

Output输出

(46797,) 16000

在此处输入图片说明

A sample TIMIT database wav file: https://catalog.ldc.upenn.edu/desc/addenda/LDC93S1.wav TIMIT 数据库 wav 文件示例: https ://catalog.ldc.upenn.edu/desc/addenda/LDC93S1.wav

Sometimes this can be caused by the incorrect method of extracting a 7zip file.有时这可能是由于提取 7zip 文件的方法不正确造成的。 I had a similar issue.我有一个类似的问题。 I sorted out this issue by extracting the dataset using 7z x <datasetname>.7z我通过使用7z x <datasetname>.7z提取数据集解决了这个问题

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM