简体   繁体   English

在 python 中将 PCM WAV 转换为普通 WAV

[英]Convert PCM WAV to normal WAV in python

I was using speech_recognition with a wav file that pjsua recorded, and it always ends with an error msg when I try to send the content of the file.我将 Speech_recognition 与 pjsua 记录的 wav 文件一起使用,当我尝试发送文件内容时,它总是以错误消息结尾。

Audio file could not be read as PCM WAV, AIFF/AIFF-C, or Native FLAC; check if file is corrupted or in another format

The file plays normally using MPV, and inspecting the file show that it's a PCM (I used the file command).该文件使用 MPV 正常播放,检查文件显示它是 PCM(我使用了 file 命令)。

test2.wav: RIFF (little-endian) data, WAVE audio, Microsoft PCM, 16 bit, mono 16000 Hz

Looking up, I found a guy with a similar problem , the proposed solution (change a few parameters using the wave library) did not work out to me.查找时,我发现了一个有类似问题的人,建议的解决方案(使用 wave 库更改一些参数)对我来说并不奏效。 After I use the wav.setparams((2, 2, 44100, 0, 'NONE', 'NONE')) the audio became complete garbage, like ant's talking.在我使用wav.setparams((2, 2, 44100, 0, 'NONE', 'NONE')) ,音频变得完全垃圾,就像蚂蚁在说话一样。

I really don't understand enough about sound files to understand what the "channels", "sampwidth", "framerate", "nframes", "comptype" and "compname" means...我真的不太了解声音文件,无法理解“通道”、“采样宽度”、“帧速率”、“nframes”、“comptype”和“compname”的含义......

You've misunderstood the error message.您误解了错误消息。 PCM is intrinsic to the wave file format. PCM 是波形文件格式所固有的。 There is no "PCM" version, and then a "Normal" version - The wave file format always uses Pulse Code Modulation (PCM) - that really just means that the samples that make up your signal are quantized digitally and contiguous.没有“PCM”版本,然后是“普通”版本 - 波形文件格式始终使用脉冲编码调制 (PCM) - 这实际上只是意味着构成信号的样本被数字量化且连续。 If your speech_recognition function can't parse the wave file, it's not because of anything related to PCM.如果您的speech_recognition function 无法解析波形文件,则不是因为与 PCM 有关。

I don't know anything about the SpeechRecognition module (I'm assuming that's what you're using?).我对SpeechRecognition模块一无所知(我假设这就是你正在使用的?)。 I also don't know anything about pjsua .我对pjsua也一无所知。 My guess is that pjsua is possibly baking in some additional chunks in the header meta-data, which the SpeechRecognition API isn't expecting.我的猜测是pjsua可能会在 header 元数据中烘烤一些额外的块,这是SpeechRecognition API 所不期望的。 Is there any chance you can share the wave file via dropbox, etc?您是否有机会通过保管箱等共享波形文件?

Also, the reason your audio sounded like "ants talking" is because of the discrepency between the meta-data your wave file contains, and the meta-data you wrote to your new wave file.此外,您的音频听起来像“蚂蚁说话”的原因是因为您的波形文件包含的元数据与您写入新波形文件的元数据之间存在差异。 Your wave file is mono - that means one channel, you wrote two.您的波形文件是 mono - 这意味着一个通道,您写了两个。 Your file also has a samplerate of 16khz, but you wrote 44.1khz.您的文件也有 16khz 的采样率,但您写的是 44.1khz。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM