简体繁体 English

Google Cloud Speech：单词开始时间

[英]Google Cloud Speech: word start time

原文 2017-02-10 08:41:25 1 2 audio/ speech-to-text/ google-speech-api/ google-cloud-speech

I'm looking at using Google Cloud Speech to convert long-form narrated audio files and I need to know the start time of each phrase in the audio file. 我正在使用Google Cloud Speech转换长格式的叙述音频文件，我需要知道音频文件中每个短语的开始时间。 Is there a way to do this with Google Cloud Speech? 有没有办法使用Google Cloud Speech做到这一点？ I'm currently working with the transcribe_async.py . 我目前正在使用transcribe_async.py 。 Thanks. 谢谢。

2 个解决方案

This is not possible with Google Cloud Speech. Google Cloud Speech无法做到这一点。 If that information is important to you, you may need to look at other ASR systems. 如果这些信息对您很重要，则可能需要查看其他ASR系统。 I know that offline, non-hosted ASR systems like Kaldi and CMU Sphinx will give you this information. 我知道离线，非托管的ASR系统（例如Kaldi和CMU Sphinx）将为您提供此信息。 I don't know if or which hosted ASR systems can provide that information. 我不知道是否可以由哪个托管ASR系统提供该信息。

You can get (aproximated) start and end times (from the beginning of the audio track) for each word by setting to True the enableWordTimeOffsets option: https://cloud.google.com/speech/docs/async-time-offsets . 通过将enableWordTimeOffsets选项设置为True，可以获得每个单词的（大约）开始和结束时间（从音轨的开始）： https ://cloud.google.com/speech/docs/async-time-offsets。

Beware that the start time of the first word of the transcript is always 0 and that, as far as I know, each word start time correspond to the previous word end time (also if there are pauses). 请注意，成绩单第一个单词的开始时间始终为0，据我所知，每个单词的开始时间都与上一个单词的结束时间相对应（即使有暂停）。