简体   繁体   English

使用 java 和 twilio 将 Google 语音转为文本

[英]Google speech to text with java and twilio

I am having problem while converting audio file to text using google speech to text.我在使用谷歌语音将音频文件转换为文本时遇到问题。 I am able to download the file from Twilio but when I supply that audio file to google speech then it gives me 0 length response.我可以从 Twilio 下载该文件,但是当我将该音频文件提供给谷歌语音时,它会给我 0 长度的响应。 But if I convert this downloaded file using vlc media player and then supply it to google speech then it gives me right output. Please help me on this I am stuck for about a week now.但是,如果我使用 vlc 媒体播放器转换这个下载的文件,然后将其提供给谷歌语音,那么它就会给我正确的 output。请帮助我,我现在被困了大约一个星期。

After getting response from Twilio I save it in a file with.wav extension收到 Twilio 的回复后,我将其保存在扩展名为 .wav 的文件中

InputStream in = new URL(jsonObject.get("redirect_to").toString()).openStream();
Files.copy(in, Paths.get("src/main/resources/mp.wav"), StandardCopyOption.REPLACE_EXISTING);

Below is the google speech to text code.下面是谷歌语音到文本代码。

Path path = Paths.get("src/main/resources/mp.wav");
        byte[] content = Files.readAllBytes(path);
        ByteString audioBytes = ByteString.copyFrom(content);

        try (SpeechClient speech = SpeechClient.create()) {
            RecognitionConfig recConfig =
                    RecognitionConfig.newBuilder()
                            .setEncoding(RecognitionConfig.AudioEncoding.LINEAR16)
                            .setLanguageCode("en-US")
                            .setSampleRateHertz(44100)
                            .setModel("default")
                            .setAudioChannelCount(2)
                            .build();


            RecognitionAudio recognitionAudio = RecognitionAudio.newBuilder().setContent(audioBytes).build();

            OperationFuture<LongRunningRecognizeResponse, LongRunningRecognizeMetadata> response =
                    speech.longRunningRecognizeAsync(recConfig, recognitionAudio);

            while (!response.isDone()) {
                System.out.println("Waiting for response...");
                Thread.sleep(10000);
            }

            List<SpeechRecognitionResult> results = response.get().getResultsList();

            for (SpeechRecognitionResult result : results) {
                // There can be several alternative transcripts for a given chunk of speech. Just use the
                // first (most likely) one here.
                SpeechRecognitionAlternative alternative = result.getAlternativesList().get(0);
                System.out.printf("Transcription: %s%n", alternative.getTranscript());
            }

        } catch (InterruptedException | ExecutionException e) {
            e.printStackTrace();
        }

As @philnash has suggested, by appending a .mp3 extension to the recording URL, the MP3 version of the recording can be downloaded from Twilio. The same applies to the '.wav' extension as well.正如@philnash 所建议的那样,通过将.mp3扩展名附加到录音 URL,可以从 Twilio 下载录音的 MP3 版本。这同样适用于“.wav”扩展名。

InputStream in = new URL(jsonObject.get(“redirect_to”).toString()+”.mp3”).openStream(); // or “.wav”
Files.copy(in, Paths.get(“src/main/resources/mp.wav”), StandardCopyOption.REPLACE_EXISTING);

I tested this out with a sample Twilio recording and the ffprobe results are below.我用样本 Twilio 记录对此进行了测试, ffprobe结果如下。

Downloaded .wav file下载.wav文件

Input #0, **wav**, from 'from-twilio-change-extension.wav':
  Duration: 00:00:14.60, bitrate: 128 kb/s
    Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 8000 Hz, 1 channels, s16, 128 kb/s

Downloaded .mp3 file下载.mp3文件

Input #0, **mp3**, from 'from-twilio-change-extension.mp3':
  Duration: 00:00:14.68, start: 0.000000, bitrate: 32 kb/s
    Stream #0:0: Audio: mp3, 22050 Hz, mono, fltp, 32 kb/s

As for audio encodings supported by the Speech-to-Text API, both WAV and MP3 are supported but MP3 is a Beta feature available only in the version v1p1beta1 .至于 Speech-to-Text API 支持的音频编码,同时支持 WAV 和 MP3,但 MP3 是 Beta 功能,仅在版本v1p1beta1中可用。 So, the client library imports will look like com.google.cloud.speech.v1p1beta1.Packages... .因此,客户端库导入将类似于com.google.cloud.speech.v1p1beta1.Packages... The audio encoding in RecognitionConfig has to be changed according to the encoding of the audio file used. RecognitionConfig中的音频编码必须根据所使用的音频文件的编码进行更改。 For a .wav file, RecognitionConfig.AudioEncoding.LINEAR16 has to be used, and for a .mp3 file, RecognitionConfig.AudioEncoding.MP3 has to be used.对于.wav文件,必须使用RecognitionConfig.AudioEncoding.LINEAR16 ,对于.mp3文件,必须使用RecognitionConfig.AudioEncoding.MP3


An alternative would be to use the FFMPEG tool to convert audio files into one of the recognized codecs by Speech-to-Text.另一种方法是使用FFMPEG工具将音频文件转换为 Speech-to-Text 可识别的编解码器之一。 More information about usage of the tool can be found here .可以在此处找到有关该工具使用的更多信息。 In your scenario, the .mka to .wav / .mp3 conversion can be done from the Java code as shown below.在您的场景中, .mka.wav / .mp3的转换可以从 Java 代码完成,如下所示。

String[] ffmpegCommand = {"ffmpeg", "-i", "/full/path/to/inputFile.mka", "/full/path/to/outputFile.wav"};

ProcessBuilder pb = new ProcessBuilder(ffmpegCommand);
pb.inheritIO();
pb.start();

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM