简体   繁体   English

python中谷歌语音识别的“音频数据必须是音频数据”错误

[英]'Audio data must be audio data' error with google speech recognition in python

I am trying to load an audio file in python and process it with google speech recognition我正在尝试在 python 中加载音频文件并使用谷歌语音识别处理它

The problem is that unlike in C++, python doesn't show data types, classes, or give you access to memory to convert between one data type and another by creating a new object and repacking data问题在于,与 C++ 不同,python 不显示数据类型、类,也不让您访问内存以通过创建新对象和重新打包数据在一种数据类型和另一种数据类型之间进行转换

I dont understand how it's possible to convert from one data type to another in python我不明白如何在 python 中从一种数据类型转换为另一种数据类型

The code in question is below,有问题的代码如下,

import speech_recognition as spr 
import librosa

audio, sr = librosa.load('sample_data/metal.mp3')

# create a speech recognition object 
r = spr.Recognizer() 

r.recognize_google(audio)

The error is:错误是:

audio_data must be audio data

How do I convert the audio object to be used in google speech recognition如何转换要在谷歌语音识别中使用的音频对象

Librosa returns numpy array, you need to convert it back to wav. Librosa 返回 numpy 数组,您需要将其转换回 wav。 Something like this:像这样的东西:

 raw_audio = np.int16(audio/np.max(np.abs(audio)) * 32767).tobytes()

You probably better load mp3 with ffmpeg wrapper without librosa things, librosa does strange things with the audio (normalizes, etc).您可能最好在没有 librosa 的情况下使用 ffmpeg 包装器加载 mp3,librosa 对音频做了奇怪的事情(标准化等)。 Its better to work with raw data.最好使用原始数据。

Try this with speech recognizer:用语音识别器试试这个:

import speech_recognition as spr 

with spr.WavFile('sample_data/metal.mp3') as source:     
     audio = r.record(source)  

r = spr.Recognizer() 
r.recognize_google(audio)

@Mich, I hope you have found a solution by now. @Mich,我希望您现在已经找到了解决方案。 If not, please try the below.如果没有,请尝试以下方法。

First, convert the .mp3 format to .wav format using other methods as a pre-process step.首先,使用其他方法将 .mp3 格式转换为 .wav 格式作为预处理步骤。

import speech_recognition as sr

# Create an instance of the Recognizer class
recognizer = sr.Recognizer()

# Create audio file instance from the original file
audio_ex = sr.AudioFile('sample_data/metal.wav')
type(audio_ex)

# Create audio data
with audio_ex as source:
    audiodata = recognizer.record(audio_ex)
type(audiodata)

# Extract text
text = recognizer.recognize_google(audio_data=audiodata, language='en-US')

print(text)

You can select the speech language from https://cloud.google.com/speech-to-text/docs/languages您可以从https://cloud.google.com/speech-to-text/docs/languages选择语音语言

Additionally you can set the minimum threshold for the loudness of the audio using below command.此外,您可以使用以下命令设置音频响度的最小阈值。

recognizer.set_threshold = 300 # min threshold set to 300

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 音频流 Python 上的 Google 流语音识别 - Google Streaming Speech Recognition on an Audio Stream Python 将 python-sounddevice.RawInputStream 生成的音频数据发送到 Google Cloud Speech-to-Text 进行异步识别 - Sending audio data generated by python-sounddevice.RawInputStream to Google Cloud Speech-to-Text for asynchronous recognition 从音频文件中读取数据时出现语音识别错误 - Speech recognition error while reading data from an audio file 如何在wit.ai中发送分块的音频数据进行语音识别? - how to send chunked audio data for speech recognition in wit.ai? Python获取语音到文本语音音频数据 - Python get Speech to text voice audio data 如何将数据流更改为Google语音识别(Python) - How to change data stream to the google speech recognition (Python) 用于输入自动语音识别模型的不同格式(比特率)的音频数据 - Different formats (bit rate) of audio data for entering Automatic speech recognition models 使用语音识别在Python中将音频Blob转换为文本 - Converting Audio Blob to text in Python using Speech recognition Python Speech_Recognition - “此音频源已在上下文管理器中” - Python Speech_Recognition - "This audio source is already inside a context manager" Python:在语音识别中获取系统音频而不是麦克风 - Python: Get system audio in speech recognition instead of microphone
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM