简体   繁体   English

如何使用Python中的Bing Speech API转录语音文件?

[英]How can I transcribe a speech file with the Bing Speech API in Python?

How can I transcribe a speech file with the Bing Speech API in Python? 如何使用Python中的Bing Speech API转录语音文件? My speech file is longer than 15 seconds. 我的语音文件超过15秒。


I'm aware that one may use the Bing Speech REST API in Python. 我知道可以在Python中使用Bing Speech REST API。 https://gist.github.com/jellis505/973ea6de12508c7c720da4a074e7d065 gives an example in Python 2: https://gist.github.com/jellis505/973ea6de12508c7c720da4a074e7d065在Python 2中给出了一个示例:

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import requests
import httplib
import uuid
import json

class Microsoft_ASR():
    def __init__(self):
        self.sub_key = 'YourKeyHere'
        self.token = None
        pass

    def get_speech_token(self):
        FetchTokenURI = "/sts/v1.0/issueToken"
        header = {'Ocp-Apim-Subscription-Key': self.sub_key}
        conn = httplib.HTTPSConnection('api.cognitive.microsoft.com')
        body = ""
        conn.request("POST", FetchTokenURI, body, header)
        response = conn.getresponse()
        str_data = response.read()
        conn.close()
        self.token = str_data
        print "Got Token: ", self.token
        return True

    def transcribe(self,speech_file):

        # Grab the token if we need it
        if self.token is None:
            print "No Token... Getting one"
            self.get_speech_token()

        endpoint = 'https://speech.platform.bing.com/recognize'
        request_id = uuid.uuid4()
        # Params form Microsoft Example 
        params = {'scenarios': 'ulm',
                  'appid': 'D4D52672-91D7-4C74-8AD8-42B1D98141A5',
                  'locale': 'en-US',
                  'version': '3.0',
                  'format': 'json',
                  'instanceid': '565D69FF-E928-4B7E-87DA-9A750B96D9E3',
                  'requestid': uuid.uuid4(),
                  'device.os': 'linux'}
        content_type = "audio/wav; codec=""audio/pcm""; samplerate=16000"

        def stream_audio_file(speech_file, chunk_size=1024):
            with open(speech_file, 'rb') as f:
                while 1:
                    data = f.read(1024)
                    if not data:
                        break
                    yield data

        headers = {'Authorization': 'Bearer ' + self.token, 
                   'Content-Type': content_type}
        resp = requests.post(endpoint, 
                            params=params, 
                            data=stream_audio_file(speech_file), 
                            headers=headers)
        val = json.loads(resp.text)
        return val["results"][0]["name"], val["results"][0]["confidence"]

if __name__ == "__main__":
    ms_asr = Microsoft_ASR()
    ms_asr.get_speech_token()
    text, confidence = ms_asr.transcribe('Your Wav File Here')
    print "Text: ", text
    print "Confidence: ", confidence

However, the Bing Speech REST API cannot convert audio files longer than 15 seconds according to https://docs.microsoft.com/en-us/azure/cognitive-services/speech/home : 但是,根据https://docs.microsoft.com/en-us/azure/cognitive-services/speech/home,Bing Speech REST API无法转换超过15秒的音频文件:

在此输入图像描述

you can convert large files upto the extent of 10 min using bing speech but you need to build a websocket for it as it the other alternative within bing for large audio files. 您可以使用bing语音将大文件转换为10分钟,但是您需要为它构建一个websocket,因为它是bing中用于大型音频文件的另一种选择。 Here is the github repo bing speech 这是github repo bing演讲

You can also use batch transcription with larger files. 您还可以使用较大文件的批量转录。

https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/batch-transcription https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/batch-transcription

The files need to be stored on Azure Blob Storage first. 这些文件首先需要存储在Azure Blob存储中。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在 Python 中将直接语音转换为间接语音? - How can I convert convert Direct speech to Indirect Speech in Python? 如何在 python 中将文本转换为语音(mp3 文件)? - How can I convert text to speech (mp3 file) in python? 谷歌云语音在python中转录3gp - Google cloud speech transcribe 3gp in python 在Python中调用Bing或IBM文本语音API? - Calling Bing or IBM text-to-speech API in Python? 我们如何使用 python 将存储容器中的输入文件提供给 azure 语音 api - How can we give the input file from storage container to azure speech api using python 如何使用 Python Azure 文本到语音 api 生成 mp3 文件 - How do I generate an mp3 file using Python Azure text to speech api 使用python speech_recognition播放和流式传输音频 - Play and stream transcribe audio with python speech_recognition 如何优化我的 python 代码以进行语音识别? - How can i optimize my python code for speech recognition? 如何使用 google text-to-speech (gTTS) 保存到 MP3 文件 2 不同语言的变量? (蟒蛇) - How with google text-to-speech (gTTS) I can save to the MP3 file 2 Variables with different languages? (Python) 如何在Python中使用语音识别检测一个单词 - How can i detect one word with speech recognition in Python
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM