[英]Google speech to text with java and twilio
I am having problem while converting audio file to text using google speech to text.我在使用谷歌语音将音频文件转换为文本时遇到问题。 I am able to download the file from Twilio but when I supply that audio file to google speech then it gives me 0 length response.
我可以从 Twilio 下载该文件,但是当我将该音频文件提供给谷歌语音时,它会给我 0 长度的响应。 But if I convert this downloaded file using vlc media player and then supply it to google speech then it gives me right output. Please help me on this I am stuck for about a week now.
但是,如果我使用 vlc 媒体播放器转换这个下载的文件,然后将其提供给谷歌语音,那么它就会给我正确的 output。请帮助我,我现在被困了大约一个星期。
After getting response from Twilio I save it in a file with.wav extension收到 Twilio 的回复后,我将其保存在扩展名为 .wav 的文件中
InputStream in = new URL(jsonObject.get("redirect_to").toString()).openStream();
Files.copy(in, Paths.get("src/main/resources/mp.wav"), StandardCopyOption.REPLACE_EXISTING);
Below is the google speech to text code.下面是谷歌语音到文本代码。
Path path = Paths.get("src/main/resources/mp.wav");
byte[] content = Files.readAllBytes(path);
ByteString audioBytes = ByteString.copyFrom(content);
try (SpeechClient speech = SpeechClient.create()) {
RecognitionConfig recConfig =
RecognitionConfig.newBuilder()
.setEncoding(RecognitionConfig.AudioEncoding.LINEAR16)
.setLanguageCode("en-US")
.setSampleRateHertz(44100)
.setModel("default")
.setAudioChannelCount(2)
.build();
RecognitionAudio recognitionAudio = RecognitionAudio.newBuilder().setContent(audioBytes).build();
OperationFuture<LongRunningRecognizeResponse, LongRunningRecognizeMetadata> response =
speech.longRunningRecognizeAsync(recConfig, recognitionAudio);
while (!response.isDone()) {
System.out.println("Waiting for response...");
Thread.sleep(10000);
}
List<SpeechRecognitionResult> results = response.get().getResultsList();
for (SpeechRecognitionResult result : results) {
// There can be several alternative transcripts for a given chunk of speech. Just use the
// first (most likely) one here.
SpeechRecognitionAlternative alternative = result.getAlternativesList().get(0);
System.out.printf("Transcription: %s%n", alternative.getTranscript());
}
} catch (InterruptedException | ExecutionException e) {
e.printStackTrace();
}
As @philnash has suggested, by appending a .mp3
extension to the recording URL, the MP3 version of the recording can be downloaded from Twilio. The same applies to the '.wav' extension as well.正如@philnash 所建议的那样,通过将
.mp3
扩展名附加到录音 URL,可以从 Twilio 下载录音的 MP3 版本。这同样适用于“.wav”扩展名。
InputStream in = new URL(jsonObject.get(“redirect_to”).toString()+”.mp3”).openStream(); // or “.wav”
Files.copy(in, Paths.get(“src/main/resources/mp.wav”), StandardCopyOption.REPLACE_EXISTING);
I tested this out with a sample Twilio recording and the ffprobe
results are below.我用样本 Twilio 记录对此进行了测试,
ffprobe
结果如下。
Downloaded .wav
file下载
.wav
文件
Input #0, **wav**, from 'from-twilio-change-extension.wav':
Duration: 00:00:14.60, bitrate: 128 kb/s
Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 8000 Hz, 1 channels, s16, 128 kb/s
Downloaded .mp3
file下载
.mp3
文件
Input #0, **mp3**, from 'from-twilio-change-extension.mp3':
Duration: 00:00:14.68, start: 0.000000, bitrate: 32 kb/s
Stream #0:0: Audio: mp3, 22050 Hz, mono, fltp, 32 kb/s
As for audio encodings supported by the Speech-to-Text API, both WAV and MP3 are supported but MP3 is a Beta feature available only in the version v1p1beta1
.至于 Speech-to-Text API 支持的音频编码,同时支持 WAV 和 MP3,但 MP3 是 Beta 功能,仅在版本
v1p1beta1
中可用。 So, the client library imports will look like com.google.cloud.speech.v1p1beta1.Packages...
.因此,客户端库导入将类似于
com.google.cloud.speech.v1p1beta1.Packages...
The audio encoding in RecognitionConfig
has to be changed according to the encoding of the audio file used. RecognitionConfig
中的音频编码必须根据所使用的音频文件的编码进行更改。 For a .wav
file, RecognitionConfig.AudioEncoding.LINEAR16
has to be used, and for a .mp3
file, RecognitionConfig.AudioEncoding.MP3
has to be used.对于
.wav
文件,必须使用RecognitionConfig.AudioEncoding.LINEAR16
,对于.mp3
文件,必须使用RecognitionConfig.AudioEncoding.MP3
。
An alternative would be to use the FFMPEG
tool to convert audio files into one of the recognized codecs by Speech-to-Text.另一种方法是使用
FFMPEG
工具将音频文件转换为 Speech-to-Text 可识别的编解码器之一。 More information about usage of the tool can be found here .可以在此处找到有关该工具使用的更多信息。 In your scenario, the
.mka
to .wav
/ .mp3
conversion can be done from the Java code as shown below.在您的场景中,
.mka
到.wav
/ .mp3
的转换可以从 Java 代码完成,如下所示。
String[] ffmpegCommand = {"ffmpeg", "-i", "/full/path/to/inputFile.mka", "/full/path/to/outputFile.wav"};
ProcessBuilder pb = new ProcessBuilder(ffmpegCommand);
pb.inheritIO();
pb.start();
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.