如何使用Python中的Bing Speech API转录语音文件？

Question

How can I transcribe a speech file with the Bing Speech API in Python? 如何使用Python中的Bing Speech API转录语音文件？ My speech file is longer than 15 seconds. 我的语音文件超过15秒。

I'm aware that one may use the Bing Speech REST API in Python. 我知道可以在Python中使用Bing Speech REST API。 https://gist.github.com/jellis505/973ea6de12508c7c720da4a074e7d065 gives an example in Python 2: https://gist.github.com/jellis505/973ea6de12508c7c720da4a074e7d065在Python 2中给出了一个示例：

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import requests
import httplib
import uuid
import json

class Microsoft_ASR():
    def __init__(self):
        self.sub_key = 'YourKeyHere'
        self.token = None
        pass

    def get_speech_token(self):
        FetchTokenURI = "/sts/v1.0/issueToken"
        header = {'Ocp-Apim-Subscription-Key': self.sub_key}
        conn = httplib.HTTPSConnection('api.cognitive.microsoft.com')
        body = ""
        conn.request("POST", FetchTokenURI, body, header)
        response = conn.getresponse()
        str_data = response.read()
        conn.close()
        self.token = str_data
        print "Got Token: ", self.token
        return True

    def transcribe(self,speech_file):

        # Grab the token if we need it
        if self.token is None:
            print "No Token... Getting one"
            self.get_speech_token()

        endpoint = 'https://speech.platform.bing.com/recognize'
        request_id = uuid.uuid4()
        # Params form Microsoft Example 
        params = {'scenarios': 'ulm',
                  'appid': 'D4D52672-91D7-4C74-8AD8-42B1D98141A5',
                  'locale': 'en-US',
                  'version': '3.0',
                  'format': 'json',
                  'instanceid': '565D69FF-E928-4B7E-87DA-9A750B96D9E3',
                  'requestid': uuid.uuid4(),
                  'device.os': 'linux'}
        content_type = "audio/wav; codec=""audio/pcm""; samplerate=16000"

        def stream_audio_file(speech_file, chunk_size=1024):
            with open(speech_file, 'rb') as f:
                while 1:
                    data = f.read(1024)
                    if not data:
                        break
                    yield data

        headers = {'Authorization': 'Bearer ' + self.token, 
                   'Content-Type': content_type}
        resp = requests.post(endpoint, 
                            params=params, 
                            data=stream_audio_file(speech_file), 
                            headers=headers)
        val = json.loads(resp.text)
        return val["results"][0]["name"], val["results"][0]["confidence"]

if __name__ == "__main__":
    ms_asr = Microsoft_ASR()
    ms_asr.get_speech_token()
    text, confidence = ms_asr.transcribe('Your Wav File Here')
    print "Text: ", text
    print "Confidence: ", confidence

However, the Bing Speech REST API cannot convert audio files longer than 15 seconds according to https://docs.microsoft.com/en-us/azure/cognitive-services/speech/home : 但是，根据https://docs.microsoft.com/en-us/azure/cognitive-services/speech/home，Bing Speech REST API无法转换超过15秒的音频文件：

Answer 1

you can convert large files upto the extent of 10 min using bing speech but you need to build a websocket for it as it the other alternative within bing for large audio files. 您可以使用bing语音将大文件转换为10分钟，但是您需要为它构建一个websocket，因为它是bing中用于大型音频文件的另一种选择。 Here is the github repo bing speech 这是github repo bing演讲

Answer 2

You can also use batch transcription with larger files. 您还可以使用较大文件的批量转录。

https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/batch-transcription https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/batch-transcription

The files need to be stored on Azure Blob Storage first. 这些文件首先需要存储在Azure Blob存储中。

如何使用Python中的Bing Speech API转录语音文件？

问题描述

2 个解决方案

解决方案1
0 2018-06-15 10:01:39

解决方案2
0 2018-10-08 09:59:24

如何使用Python中的Bing Speech API转录语音文件？

问题描述

2 个解决方案

解决方案1 0 2018-06-15 10:01:39

解决方案2 0 2018-10-08 09:59:24

解决方案1
0 2018-06-15 10:01:39

解决方案2
0 2018-10-08 09:59:24