python中谷歌语音识别的“音频数据必须是音频数据”错误

Question

I am trying to load an audio file in python and process it with google speech recognition我正在尝试在 python 中加载音频文件并使用谷歌语音识别处理它

The problem is that unlike in C++, python doesn't show data types, classes, or give you access to memory to convert between one data type and another by creating a new object and repacking data问题在于，与 C++ 不同，python 不显示数据类型、类，也不让您访问内存以通过创建新对象和重新打包数据在一种数据类型和另一种数据类型之间进行转换

I dont understand how it's possible to convert from one data type to another in python我不明白如何在 python 中从一种数据类型转换为另一种数据类型

The code in question is below,有问题的代码如下，

import speech_recognition as spr 
import librosa

audio, sr = librosa.load('sample_data/metal.mp3')

# create a speech recognition object 
r = spr.Recognizer() 

r.recognize_google(audio)

The error is:错误是：

audio_data must be audio data

How do I convert the audio object to be used in google speech recognition如何转换要在谷歌语音识别中使用的音频对象

Answer 1

Librosa returns numpy array, you need to convert it back to wav. Librosa 返回 numpy 数组，您需要将其转换回 wav。 Something like this:像这样的东西：

 raw_audio = np.int16(audio/np.max(np.abs(audio)) * 32767).tobytes()

You probably better load mp3 with ffmpeg wrapper without librosa things, librosa does strange things with the audio (normalizes, etc).您可能最好在没有 librosa 的情况下使用 ffmpeg 包装器加载 mp3，librosa 对音频做了奇怪的事情（标准化等）。 Its better to work with raw data.最好使用原始数据。

Answer 2

Try this with speech recognizer:用语音识别器试试这个：

import speech_recognition as spr 

with spr.WavFile('sample_data/metal.mp3') as source:     
     audio = r.record(source)  

r = spr.Recognizer() 
r.recognize_google(audio)

Answer 3

@Mich, I hope you have found a solution by now. @Mich，我希望您现在已经找到了解决方案。 If not, please try the below.如果没有，请尝试以下方法。

First, convert the .mp3 format to .wav format using other methods as a pre-process step.首先，使用其他方法将 .mp3 格式转换为 .wav 格式作为预处理步骤。

import speech_recognition as sr

# Create an instance of the Recognizer class
recognizer = sr.Recognizer()

# Create audio file instance from the original file
audio_ex = sr.AudioFile('sample_data/metal.wav')
type(audio_ex)

# Create audio data
with audio_ex as source:
    audiodata = recognizer.record(audio_ex)
type(audiodata)

# Extract text
text = recognizer.recognize_google(audio_data=audiodata, language='en-US')

print(text)

You can select the speech language from https://cloud.google.com/speech-to-text/docs/languages您可以从https://cloud.google.com/speech-to-text/docs/languages选择语音语言

Additionally you can set the minimum threshold for the loudness of the audio using below command.此外，您可以使用以下命令设置音频响度的最小阈值。

recognizer.set_threshold = 300 # min threshold set to 300

python中谷歌语音识别的“音频数据必须是音频数据”错误

问题描述

3 个解决方案

解决方案1
0 2020-03-27 07:46:15

解决方案2
0 2021-04-04 14:06:17

解决方案3
0 2021-09-01 12:42:53

python中谷歌语音识别的“音频数据必须是音频数据”错误

问题描述

3 个解决方案

解决方案1 0 2020-03-27 07:46:15

解决方案2 0 2021-04-04 14:06:17

解决方案3 0 2021-09-01 12:42:53

解决方案1
0 2020-03-27 07:46:15

解决方案2
0 2021-04-04 14:06:17

解决方案3
0 2021-09-01 12:42:53