简体   繁体   English

用于Google的Speech API的MP3到FLAC

[英]MP3 to FLAC for Google's Speech API

I'm trying to find a simple way to send an MP3 to Google for speech recognition. 我正在尝试找一种简单的方法将MP3发送给Google进行语音识别。 Currently, I'm using a sub process to call SoX which converts it to a WAV. 目前,我正在使用子进程调用SoX,将其转换为WAV。 Then, using SpeechRecognition , it converts it again to FLAC. 然后,使用SpeechRecognition ,它再次将其转换为FLAC。 Ideally, I'd like a more portable (not OS specific) way to decode the MP3 and send it with no intermediate file saving and the like. 理想情况下,我想要一种更便携(非特定于操作系统)的方式解码MP3并发送它没有中间文件保存等。

Here's what I have currently: 这是我目前的情况:

import speech_recognition as sr
import subprocess
import requests

audio = requests.get('http://somesite.com/some.mp3')

with open('/tmp/audio.mp3', 'wb') as file:
    file.write(audio.content)

subprocess.run(['sox', '/tmp/audio.mp3', '/tmp/audio.wav'])

r = sr.Recognizer()
with sr.WavFile('/tmp/audio.wav') as source:
    audio = r.record(source)

result = r.recognize_google(audio)
del r

I've tried directly using the FLAC binaries included in SpeechRecognition , but the output was just static. 我试图直接利用包括在FLAC二进制语音识别 ,但输出的只是静态的。 I'm not too keen on distributing binaries on Git, but I will if that is the only way. 我不太热衷于在Git上发布二进制文件,但如果这是唯一的方法我会的。

Some important links: 一些重要的链接:

SR's code for speech recognition SR的语音识别代码

SR's code for WAV to FLAC SR的WAV到FLAC的代码

Edit 编辑

I'm considering distributing SoX in a way like the FLAC binaries were, one for each OS, if SoX's license allows it... 我正在考虑以类似FLAC二进制文件的方式分发SoX,每个操作系统一个,如果SoX的许可允许它......

Second thought, software licenses are confusing and I don't want to mess with that. 再想一想,软件许可证令人困惑,我不想搞砸。

I decided to go with this: 我决定这样做:

import subprocess
import requests
import shutil
import glob
import json

audio = requests.get('http://somesite.com/some.mp3')
sox = shutil.which('sox') or glob.glob('C:\Program Files*\sox*\sox.exe')[0]
p = subprocess.Popen(sox + ' -t mp3 - -t flac - rate 16k', stdin = subprocess.PIPE, stdout = subprocess.PIPE, shell = True)
stdout, stderr = p.communicate(audio.content)
url = 'http://www.google.com/speech-api/v2/recognize?client=chromium&lang=en-US&key=AIzaSyBOti4mM-6x9WDnZIjIeyEU21OpBXqWBgw'
headers = {'Content-Type': 'audio/x-flac; rate=16000'}
response = requests.post(url, data = stdout, headers = headers).text

result = None
for line in response.split('\n'):
    try:
        result = json.loads(line)['result'][0]['alternative'][0]['transcript']
        break
    except:
        pass

This is more of a middle ground I suppose borrowing some stuff from the SR module. 这更像是一个中间立场我想从SR模块中借用一些东西。 It would require the user to install SoX, but should work on all OS and doesn't have any intermediate files. 它需要用户安装SoX,但应该适用于所有操作系统,并且没有任何中间文件。 I have only tested it on Linux however. 我只在Linux上测试过它。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用GStreamer + Python将FLAC转换为MP3? - Convert FLAC to MP3 with GStreamer + Python? 通过 Flask 将音频文件从 mp3 转换为 flac,并保存在 Google Cloud Storage - Convert an audio file from mp3 to flac by Flask, and save at Google Cloud Storage 谷歌云语音到文本没有为 OGG 和 MP3 文件提供 output - Google cloud speech to text not giving output for OGG & MP3 files Google Speech API Python异常:指定FLAC编码以匹配文件头? - Google Speech API Python Exception: Specify FLAC encoding to match file header? python 检查音频文件类型,MP3 或 FLAC - python check audio file type, MP3 or FLAC 如何使用 Python Azure 文本到语音 api 生成 mp3 文件 - How do I generate an mp3 file using Python Azure text to speech api 如何使用 google text-to-speech (gTTS) 保存到 MP3 文件 2 不同语言的变量? (蟒蛇) - How with google text-to-speech (gTTS) I can save to the MP3 file 2 Variables with different languages? (Python) Google Speech To Text API:从mp4提取音频 - Google Speech To Text API: Extracting audio from mp4 mutagen:如何在 mp3、flac 和 mp4 中检测和嵌入专辑封面 - mutagen: how to detect and embed album art in mp3, flac and mp4 语音请求错误中的Google的Cloud Speech API异常 - Google's Cloud speech API Exception in speech request error
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM