简体   繁体   中英

Google Speech API v2 result is blank

I want to use Google Speech API in my current project.

I got my information about how to access the api from here

As described on github, you have to send a post webrequest to the server and get back a result as json.

I also got some source code used for the v1 api from here

Setting up the request is not that hard:

WebRequest request = WebRequest.Create(Constants.GoogleRequestString);
            request.Method = "POST";
            request.ContentType = "audio/x-flac; rate=" + sampleRate;
            request.ContentLength = bytes.Length;

Where in my example the Constants.GoogleRequestString equals to https://www.google.com/speech-api/v2/recognize?output=json&lang=en-us&key=AIzaSyCnl6MRydhw_5fLXIdASxkLJzcJh5iX0M4

I downloaded the .flac files from the github link and wrote a little program in c# which is loading the bytes of the flac file and sending it to the server with the slightly modified method GoogleRequest(byte[] bytes, int sampleRate)

I open the stream as shown in the method, and send all bytes to the server. I get the response but

The JSON String I get is: "{\\"result\\":[]}"

I have no idea why it is not working. Either the file, or spoken text in the file is not correct (but if I listen to it with vlc I clearly hear the spoken text) or my program still has some bugs.

Have you ever encountered the problem to get no result by the speech-api? Should't it say something like result: couldn't understand what is spoken or any other error message?

I just tried out the .wav file. This worked for me.

Your code is fine assuming it resembles this:

var uriBuilder = new UriBuilder(
    "https",
    "www.google.com",
    443,
    "speech-api/v2/recognize",
    "?output=json&lang=en-us&key=YOURAPIKEY");
int sampleRate = 44100;

using (var stream = File.Open("c:\\tmp\\g2.flac", FileMode.Open))
{

    HttpWebRequest request = (HttpWebRequest) WebRequest.Create(uriBuilder.Uri);
    request.Method = "POST";
    request.ContentType = "audio/x-flac; rate=" + sampleRate;
    request.AutomaticDecompression = DecompressionMethods.GZip;

    stream.CopyTo(request.GetRequestStream());
    try
    {
        using (var resp = request.GetResponse().GetResponseStream())
        {
            using (var sr = new StreamReader(resp))
            {
                Debug.WriteLine(sr.ReadToEnd());
            }
        }
    }
    catch(WebException ee)
    {
        var all = new StreamReader(ee.Response.GetResponseStream()).ReadToEnd();
        Debug.WriteLine(all);
    }
}

What is important though is the exact format of the FLAC file. I used Audacity to control how my audio track would be saved.

After recording I changed the track settings to:

  • Mono
  • Sample Format: 16-Bit PCM
  • Rate: 44100 Hz

The following screenshot shows those settings:

大胆设置

With the default stereo track and 32-bit float sample format I couldn't get the Speech API to produce any other result then the empty json payload you also got.

With the above settings my result is:

{
    "result" : []
}{
    "result" : [{
            "alternative" : [{
                    "transcript" : "translate this",
                    "confidence" : 0.92849225
                }, {
                    "transcript" : "translate days"
                }, {
                    "transcript" : "translate dish"
                }, {
                    "transcript" : "translate fish"
                }, {
                    "transcript" : "translate these"
                }
            ],
            "final" : true
        }
    ],
    "result_index" : 0
}

My English pronunciation isn't very good as Google thinks I want to translate fish ...

If you get an http error (like 403 Forbidden) the exception handler tries to read the full response from the http body. If your authentication key is incorrect it will tell you that.

To get your api-keys to work with the Speech API follow the instructions here

Make sure you are a member of chromium-dev@chromium.org (you can just subscribe to chromium-dev and choose not to receive mail).

After that you can create a server key:

在此输入图像描述

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM