簡體   English   中英

如何使用Python中的Bing Speech API轉錄語音文件?

[英]How can I transcribe a speech file with the Bing Speech API in Python?

如何使用Python中的Bing Speech API轉錄語音文件? 我的語音文件超過15秒。


我知道可以在Python中使用Bing Speech REST API。 https://gist.github.com/jellis505/973ea6de12508c7c720da4a074e7d065在Python 2中給出了一個示例:

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import requests
import httplib
import uuid
import json

class Microsoft_ASR():
    def __init__(self):
        self.sub_key = 'YourKeyHere'
        self.token = None
        pass

    def get_speech_token(self):
        FetchTokenURI = "/sts/v1.0/issueToken"
        header = {'Ocp-Apim-Subscription-Key': self.sub_key}
        conn = httplib.HTTPSConnection('api.cognitive.microsoft.com')
        body = ""
        conn.request("POST", FetchTokenURI, body, header)
        response = conn.getresponse()
        str_data = response.read()
        conn.close()
        self.token = str_data
        print "Got Token: ", self.token
        return True

    def transcribe(self,speech_file):

        # Grab the token if we need it
        if self.token is None:
            print "No Token... Getting one"
            self.get_speech_token()

        endpoint = 'https://speech.platform.bing.com/recognize'
        request_id = uuid.uuid4()
        # Params form Microsoft Example 
        params = {'scenarios': 'ulm',
                  'appid': 'D4D52672-91D7-4C74-8AD8-42B1D98141A5',
                  'locale': 'en-US',
                  'version': '3.0',
                  'format': 'json',
                  'instanceid': '565D69FF-E928-4B7E-87DA-9A750B96D9E3',
                  'requestid': uuid.uuid4(),
                  'device.os': 'linux'}
        content_type = "audio/wav; codec=""audio/pcm""; samplerate=16000"

        def stream_audio_file(speech_file, chunk_size=1024):
            with open(speech_file, 'rb') as f:
                while 1:
                    data = f.read(1024)
                    if not data:
                        break
                    yield data

        headers = {'Authorization': 'Bearer ' + self.token, 
                   'Content-Type': content_type}
        resp = requests.post(endpoint, 
                            params=params, 
                            data=stream_audio_file(speech_file), 
                            headers=headers)
        val = json.loads(resp.text)
        return val["results"][0]["name"], val["results"][0]["confidence"]

if __name__ == "__main__":
    ms_asr = Microsoft_ASR()
    ms_asr.get_speech_token()
    text, confidence = ms_asr.transcribe('Your Wav File Here')
    print "Text: ", text
    print "Confidence: ", confidence

但是,根據https://docs.microsoft.com/en-us/azure/cognitive-services/speech/home,Bing Speech REST API無法轉換超過15秒的音頻文件:

在此輸入圖像描述

您可以使用bing語音將大文件轉換為10分鍾,但是您需要為它構建一個websocket,因為它是bing中用於大型音頻文件的另一種選擇。 這是github repo bing演講

您還可以使用較大文件的批量轉錄。

https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/batch-transcription

這些文件首先需要存儲在Azure Blob存儲中。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM