简体   繁体   中英

reading a WAV file from TIMIT database in python

I'm trying to read a wav file from the TIMIT database in python but I get an error:

When I'm using wave:

wave.Error: file does not start with RIFF id

When I'm using scipy:

ValueError: File format b'NIST'... not understood.

and when I'm using librosa, the program got stuck. I tried to convert it to wav using sox:

cmd = "sox " + wav_file + " -t wav " + new_wav
subprocess.call(cmd, shell=True)

and it didn't help. I saw an old answer referencing to the package scikits.audiolab but it looks like it is no longer supported.

How can I read these file to get a ndarray of the data?

Thanks

Your file is not a WAV file. Apparently it is a NIST SPHERE file. From the LDC web page : "Many LDC corpora contain speech files in NIST SPHERE format." According to the description of the NIST File Format , the first four characters of the file are NIST . That's what the scipy error is telling you: it doesn't know how to read a file that begins with NIST .

I suspect you'll have to convert the file to WAV if you want to read the file with any of the libraries that you tried. To force the conversion to WAV using the program sph2pipe , use the command option -f wav (or equivalently, -f rif ), eg

sph2pipe -f wav input.sph output.wav

issue this from command line to verify its a wav file ... or not

xxd -b myaudiofile.wav | head

if its wav format it will appear something like

00000000: 01010010 01001001 01000110 01000110 10111100 10101111  RIFF..
00000006: 00000001 00000000 01010111 01000001 01010110 01000101  ..WAVE
0000000c: 01100110 01101101 01110100 00100000 00010000 00000000  fmt ..
00000012: 00000000 00000000 00000001 00000000 00000001 00000000  ......
00000018: 01000000 00011111 00000000 00000000 01000000 00011111  @...@.
0000001e: 00000000 00000000 00000001 00000000 00001000 00000000  ......
00000024: 01100100 01100001 01110100 01100001 10011000 10101111  data..
0000002a: 00000001 00000000 10000001 10000000 10000001 10000000  ......
00000030: 10000001 10000000 10000001 10000000 10000001 10000000  ......
00000036: 10000001 10000000 10000001 10000000 10000001 10000000  ......

notice the wav file begins with the characters RIFF which is the mandatory indicator the file is using wav codec ... if your system (I'm on linux) does not have above command line utility : xxd then use any hex editor like wxHexEditor to similarily examine your wav file to confirm you see the RIFF ... if no RIFF then its simply not a wav file

Here are details of wav format specs

http://soundfile.sapp.org/doc/WaveFormat/

http://www-mmsp.ece.mcgill.ca/Documents/AudioFormats/WAVE/WAVE.html

http://unusedino.de/ec64/technical/formats/wav.html

http://www.drdobbs.com/database/inside-the-riff-specification/184409308

https://www.gamedev.net/articles/programming/general-and-gameplay-programming/loading-a-wave-file-r709

http://www.topherlee.com/software/pcm-tut-wavformat.html

http://www.labbookpages.co.uk/audio/javaWavFiles.html

http://www.johnloomis.org/cpe102/asgn/asgn1/riff.html

http://nagasm.org/ASL/sound05/

If you want a generic code that works for every wav file inside the folder run:

forfiles /s /m *.wav /c "cmd /c sph2pipe -f wav @file @fnameRIFF.wav"

It search for every wav file that can find and create a wav file that both scipy and wave can read with the name < base_name >RIFF.wav

I have written a python script which will convert all the .WAV files in NIST format spoken by all speakers from all dialects to .wav files which ca n be played on your system.

Note: All the dialects folders are present in ./TIMIT/TRAIN/ . You may have to change the dialects_path according to your project structure(or if you are on Windows)

from sphfile import SPHFile

dialects_path = "./TIMIT/TRAIN/"

for dialect in dialects:
    dialect_path = dialects_path + dialect
    speakers = os.listdir(path = dialect_path)
    for speaker in speakers:
        speaker_path =  os.path.join(dialect_path,speaker)        
        speaker_recordings = os.listdir(path = speaker_path)

        wav_files = glob.glob(speaker_path + '/*.WAV')

        for wav_file in wav_files:
            sph = SPHFile(wav_file)
            txt_file = ""
            txt_file = wav_file[:-3] + "TXT"

            f = open(txt_file,'r')
            for line in f:
                words = line.split(" ")
                start_time = (int(words[0])/16000)
                end_time = (int(words[1])/16000)
            print("writing file ", wav_file)
            sph.write_wav(wav_file.replace(".WAV",".wav"),start_time,end_time)    

Please use sounddevice and soundfile to obtain the numpy array data (and playback) using the following code:

import matplotlib.pyplot as plt
import soundfile as sf
import sounddevice as sd
# https://catalog.ldc.upenn.edu/desc/addenda/LDC93S1.wav
data, fs = sf.read('LDC93S1.wav')
print(data.shape,fs)
sd.play(data, fs, blocking=True)
plt.plot(data)
plt.show()

Output

(46797,) 16000

在此处输入图片说明

A sample TIMIT database wav file: https://catalog.ldc.upenn.edu/desc/addenda/LDC93S1.wav

Sometimes this can be caused by the incorrect method of extracting a 7zip file. I had a similar issue. I sorted out this issue by extracting the dataset using 7z x <datasetname>.7z

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM