简体   繁体   English

如何在wit.ai中发送分块的音频数据进行语音识别?

[英]how to send chunked audio data for speech recognition in wit.ai?

I have a large mp3 file(about 1.8GB), which I have to transcribe using wit.ai. 我有一个很大的mp3文件(约1.8GB),必须使用wit.ai进行转录。 Since I am working with wav files a lot, i converted it to wav file. 由于我经常使用WAV文件,因此将其转换为WAV文件。

But since wit.ai's speech api can't take more than 10s long audio, I am planning to stream the file in chunks. 但是,由于wit.ai的语音API不能接收超过10s的音频,因此我打算分块传输文件。 But some how I am only getting reponse 400(bad request). 但是有些我只能得到400(错误请求)的回复。 I am not able to find out, what I am sending wrong. 我无法找出我发错了什么。 Following are the details: 以下是详细信息:

headers = {'authorization': 'Bearer ' + wit_access_token,
         'Content-Type': 'audio/wav','Transfer-encoding': 'chunked'}
with open('meeting-record.wav', 'rb') as f:
    audio = f.read(2048)  # taken it any number
resp = requests.post(API_ENDPOINT, headers = headers,
                 data = audio)
print(resp) 
data = json.loads(resp.content)
text = data['_text']
print(text)
f.close()

I am getting the following output 我得到以下输出

<Response [400]>
Traceback (most recent call last):
  File ".\sound-record.py", line 61, in <module>
    text = data['_text']
KeyError: '_text'

Can someone show some pointers, where its going wrong? 有人可以显示一些指针,哪里出了问题?

I haven't used the wit.ai API before, but the Bing Speech API appears to require the data in a similar fashion. 我以前没有使用过wit.ai API,但Bing Speech API似乎需要以类似方式提供数据。 I'm not sure if you were getting the error because of your code, but in order to properly chunk and stream the file, you could add another function in there like this: 我不确定是否由于代码而出错,但是为了正确地分块和流传输文件,可以在其中添加另一个函数,如下所示:

def stream_audio_file(speech_file, chunk_size=1024):
    # Chunk audio file
    with open(speech_file, 'rb') as f:
        while 1:
            data = f.read(1024)
        if not data:
            break
        yield data

Now as long as you have that function somewhere in your file to stream and chunk the data for you, you can go back to your initial method: 现在,只要您在文件中的某处具有该功能来为您流式传输和分块数据,就可以返回到初始方法:

headers = {
    'Accept': 'application/json',
    'Transfer-Encoding': 'chunked',
    'Content-type': 'audio/wav',
    'Authorization': 'Bearer {0}'.format(YOUR_AUTH_TOKEN)
}

data = stream_audio_file(YOUR_AUDIO_FILE)

r = requests.post(url, headers=headers, data=data)

results = json.loads(r.content)

print(results)

Side Note: You mentioned you wanted something on your own server. 旁注:您提到想要在自己的服务器上安装某些东西。 There's a nice module called pocketsphinx , which is free, hosted on your machine, and written in Python. 有一个很好的模块,名为pocketsphinx ,它是免费的,可以在您的计算机上托管,并且使用Python编写。 It pairs really well with the SpeechRecognition module, which provides a decent layer on top so you don't have to spend as much time formatting your requests. 它与SpeechRecognition模块搭配得很好,该模块在顶部提供了一个不错的层,因此您不必花费太多时间来格式化请求。

Wit.ai is not supposed to transcribe long files, it is a system for recognizing short commands. Wit.ai不应转录长文件,它是用于识别短命令的系统。 You'd better use proper services: 您最好使用适当的服务:

And many others 还有很多

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM