[英]How can I transcribe a speech file with the Bing Speech API in Python?
How can I transcribe a speech file with the Bing Speech API in Python? 如何使用Python中的Bing Speech API转录语音文件? My speech file is longer than 15 seconds. 我的语音文件超过15秒。
I'm aware that one may use the Bing Speech REST API in Python. 我知道可以在Python中使用Bing Speech REST API。 https://gist.github.com/jellis505/973ea6de12508c7c720da4a074e7d065 gives an example in Python 2: https://gist.github.com/jellis505/973ea6de12508c7c720da4a074e7d065在Python 2中给出了一个示例:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import requests
import httplib
import uuid
import json
class Microsoft_ASR():
def __init__(self):
self.sub_key = 'YourKeyHere'
self.token = None
pass
def get_speech_token(self):
FetchTokenURI = "/sts/v1.0/issueToken"
header = {'Ocp-Apim-Subscription-Key': self.sub_key}
conn = httplib.HTTPSConnection('api.cognitive.microsoft.com')
body = ""
conn.request("POST", FetchTokenURI, body, header)
response = conn.getresponse()
str_data = response.read()
conn.close()
self.token = str_data
print "Got Token: ", self.token
return True
def transcribe(self,speech_file):
# Grab the token if we need it
if self.token is None:
print "No Token... Getting one"
self.get_speech_token()
endpoint = 'https://speech.platform.bing.com/recognize'
request_id = uuid.uuid4()
# Params form Microsoft Example
params = {'scenarios': 'ulm',
'appid': 'D4D52672-91D7-4C74-8AD8-42B1D98141A5',
'locale': 'en-US',
'version': '3.0',
'format': 'json',
'instanceid': '565D69FF-E928-4B7E-87DA-9A750B96D9E3',
'requestid': uuid.uuid4(),
'device.os': 'linux'}
content_type = "audio/wav; codec=""audio/pcm""; samplerate=16000"
def stream_audio_file(speech_file, chunk_size=1024):
with open(speech_file, 'rb') as f:
while 1:
data = f.read(1024)
if not data:
break
yield data
headers = {'Authorization': 'Bearer ' + self.token,
'Content-Type': content_type}
resp = requests.post(endpoint,
params=params,
data=stream_audio_file(speech_file),
headers=headers)
val = json.loads(resp.text)
return val["results"][0]["name"], val["results"][0]["confidence"]
if __name__ == "__main__":
ms_asr = Microsoft_ASR()
ms_asr.get_speech_token()
text, confidence = ms_asr.transcribe('Your Wav File Here')
print "Text: ", text
print "Confidence: ", confidence
However, the Bing Speech REST API cannot convert audio files longer than 15 seconds according to https://docs.microsoft.com/en-us/azure/cognitive-services/speech/home : 但是,根据https://docs.microsoft.com/en-us/azure/cognitive-services/speech/home,Bing Speech REST API无法转换超过15秒的音频文件:
you can convert large files upto the extent of 10 min using bing speech but you need to build a websocket for it as it the other alternative within bing for large audio files. 您可以使用bing语音将大文件转换为10分钟,但是您需要为它构建一个websocket,因为它是bing中用于大型音频文件的另一种选择。 Here is the github repo bing speech 这是github repo bing演讲
You can also use batch transcription with larger files. 您还可以使用较大文件的批量转录。
https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/batch-transcription https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/batch-transcription
The files need to be stored on Azure Blob Storage first. 这些文件首先需要存储在Azure Blob存储中。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.