简体   繁体   English

谷歌云语音到文本没有为 OGG 和 MP3 文件提供 output

[英]Google cloud speech to text not giving output for OGG & MP3 files

I am trying to perform speech to text on a bunch of audio files which are over 10 mins long.我正在尝试在一堆超过 10 分钟的音频文件上对文本执行语音。 I don't want to waste storage on the cloud bucket by straight-up uploading wav files on it.我不想通过直接上传 wav 文件来浪费云存储桶上的存储空间。 So I am using ffmpeg to convert the files either to ogg or mp3 like: ffmpeg -y -i audio.wav -ar 12000 -r 16000 audio.mp3所以我使用ffmpeg将文件转换为 ogg 或 mp3,例如: ffmpeg -y -i audio.wav -ar 12000 -r 16000 audio.mp3

ffmpeg -y -i audio.wav -ar 12000 -r 16000 audio.ogg

For testing purpose I ran the speech to text service on a dummy wav file and it seemed to work, I got the text as expected.出于测试目的,我在一个虚拟 wav 文件上运行语音到文本服务,它似乎工作,我得到了预期的文本。 But for some reason it isn't detecting any speech when I use the ogg or mp3 file.但是由于某种原因,当我使用 ogg 或 mp3 文件时,它没有检测到任何语音。 I could not give amr files to work either.我也不能让 amr 文件工作。

My code:我的代码:

def transcribe_gcs(gcs_uri):
    client = speech.SpeechClient()

    audio = speech.RecognitionAudio(uri=gcs_uri)
    config = speech.RecognitionConfig(
        encoding="OGG_OPUS", #replace with "LINEAR16" for wav, "OGG_OPUS" for ogg, "AMR" for amr
        sample_rate_hertz=16000,
        language_code="en-US",
    )
    print("starting operation")
    operation = client.long_running_recognize(config=config, audio=audio)
    response = operation.result()
    print(response)

I have set up the authentication properly, so that is not a problem.我已经正确设置了身份验证,所以这不是问题。

When I run the speech to text service on the same audio but in ogg or mp3(I just comment out the encoding setting from the config for mp3) format, it gives no response, just prints out a line break and done.当我在同一音频上运行语音到文本服务但以 ogg 或 mp3(我只是从 mp3 的配置中注释掉编码设置)格式时,它没有响应,只是打印出一个换行符并完成。

What can I do to fix this?我能做些什么来解决这个问题?

Use Opus or FLAC使用 Opus 或 FLAC

FLAC FLAC

FLAC is compressed but is lossless. FLAC 被压缩但无损。 This will result in the best speech-to-text results.这将产生最佳的语音到文本结果。

ffmpeg -i input.wav -vn output.flac

Opus作品

If file space is very important then use Opus in OGG.如果文件空间非常重要,那么在 OGG 中使用 Opus。 It can make small file sizes with excellent quality.它可以制作具有出色质量的小文件。

ffmpeg -i input.wav -vn -c:a libopus output.ogg

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM