谷歌云语音到文本没有为 OGG 和 MP3 文件提供 output

Question

I am trying to perform speech to text on a bunch of audio files which are over 10 mins long.我正在尝试在一堆超过 10 分钟的音频文件上对文本执行语音。 I don't want to waste storage on the cloud bucket by straight-up uploading wav files on it.我不想通过直接上传 wav 文件来浪费云存储桶上的存储空间。 So I am using ffmpeg to convert the files either to ogg or mp3 like: ffmpeg -y -i audio.wav -ar 12000 -r 16000 audio.mp3所以我使用ffmpeg将文件转换为 ogg 或 mp3，例如： ffmpeg -y -i audio.wav -ar 12000 -r 16000 audio.mp3

ffmpeg -y -i audio.wav -ar 12000 -r 16000 audio.ogg

For testing purpose I ran the speech to text service on a dummy wav file and it seemed to work, I got the text as expected.出于测试目的，我在一个虚拟 wav 文件上运行语音到文本服务，它似乎工作，我得到了预期的文本。 But for some reason it isn't detecting any speech when I use the ogg or mp3 file.但是由于某种原因，当我使用 ogg 或 mp3 文件时，它没有检测到任何语音。 I could not give amr files to work either.我也不能让 amr 文件工作。

My code:我的代码：

def transcribe_gcs(gcs_uri):
    client = speech.SpeechClient()

    audio = speech.RecognitionAudio(uri=gcs_uri)
    config = speech.RecognitionConfig(
        encoding="OGG_OPUS", #replace with "LINEAR16" for wav, "OGG_OPUS" for ogg, "AMR" for amr
        sample_rate_hertz=16000,
        language_code="en-US",
    )
    print("starting operation")
    operation = client.long_running_recognize(config=config, audio=audio)
    response = operation.result()
    print(response)

I have set up the authentication properly, so that is not a problem.我已经正确设置了身份验证，所以这不是问题。

When I run the speech to text service on the same audio but in ogg or mp3(I just comment out the encoding setting from the config for mp3) format, it gives no response, just prints out a line break and done.当我在同一音频上运行语音到文本服务但以 ogg 或 mp3（我只是从 mp3 的配置中注释掉编码设置）格式时，它没有响应，只是打印出一个换行符并完成。

What can I do to fix this?我能做些什么来解决这个问题？

Answer 1

Use Opus or FLAC使用 Opus 或 FLAC

Vorbis (the default audio format for OGG container) is not supported.不支持 Vorbis（OGG 容器的默认音频格式）。 See Google Cloud Speech-to-Text: Supported Audio Encodings .请参阅Google Cloud Speech-to-Text：支持的音频编码。
MP3 encoding is a Beta feature and only available in v1p1beta1. MP3 编码是 Beta 功能，仅在 v1p1beta1 中可用。 See the RecognitionConfig reference documentation for details.有关详细信息，请参阅RecognitionConfig 参考文档。

FLAC FLAC

FLAC is compressed but is lossless. FLAC 被压缩但无损。 This will result in the best speech-to-text results.这将产生最佳的语音到文本结果。

ffmpeg -i input.wav -vn output.flac

Opus作品

If file space is very important then use Opus in OGG.如果文件空间非常重要，那么在 OGG 中使用 Opus。 It can make small file sizes with excellent quality.它可以制作具有出色质量的小文件。

ffmpeg -i input.wav -vn -c:a libopus output.ogg

谷歌云语音到文本没有为 OGG 和 MP3 文件提供 output

问题描述

1 个解决方案

解决方案1
1 已采纳 2021-04-27 17:00:40

Use Opus or FLAC使用 Opus 或 FLAC

FLAC FLAC

Opus作品

谷歌云语音到文本没有为 OGG 和 MP3 文件提供 output

问题描述

1 个解决方案

解决方案1 1 已采纳 2021-04-27 17:00:40

Use Opus or FLAC使用 Opus 或 FLAC

FLAC FLAC

Opus作品

解决方案1
1 已采纳 2021-04-27 17:00:40