简体   繁体   中英

Building my own conversational voice AI with Dialogflow & Google Cloud Speech API in web apps

I would like to integrate an HTML5 microphone in my web application, stream audio streams to a (Node.js) back-end, use the Dialogflow API for audio streaming, use the google Speech API, and return audio (Text to Speech) to a client to play this in a browser.

I found a github project which is exactly what I want to do. https://github.com/dialogflow/selfservicekiosk-audio-streaming

This is Ms. Lee Boonstra's Medium blog. ( https://medium.com/google-cloud/building-your-own-conversational-voice-ai-with-dialogflow-speech-to-text-in-web-apps-part-i-b92770bd8b47 ) She has developed this project. (Thank you very much, Ms. Boonstra.) She explains this project very precisely.

First, I tried demo web application which Ms. Boonstra deployed with App Engine Flex. I accessed it ( https://selfservicedesk.appspot.com/ ) and it worked perfectly.

Next, I cloned this project and tried to deploy locally. I followed this README.md. (I skipped the Deploy with AppEngine steps.) https://github.com/dialogflow/selfservicekiosk-audio-streaming/blob/master/README.md

However, it didn't work. The web app didn't give me any response. I use Windows 10, Windows Subsystems for Linux, Debian 10.3 and Google Chrome browser.

This is Chrome's console.

kiosk_chrome

This is Terminal. (I didn't get any error message, which is mysterious for me.)

kiosk_terminal

Could you give me any advice? Thank you in advance.

Example 3 and the SelfServiceKiosk app, use the same TTS code. That's probably why both are failing.

I've tested it myself on my Windows 10 machine, with Chrome. I've got working, but I realized, from a fresh Github clone, the env.txt had certain used variables commented out. (and Windows handles.env files different, when there are comments on the same line.) I've updated the file in Github, but for you make sure your.env file looks like this:

PROJECT_ID=selfservicedesk
LANGUAGE_CODE=en-US 
ENCODING=AUDIO_ENCODING_LINEAR_16
SAMPLE_RATE_HERZ=16000
SINGLE_UTTERANCE=false
BASE_LANG=nl-NL
SSML_GENDER=NEUTRAL 
SPEECH_ENCODING=LINEAR16 

It might be, that you have already fixed that, since you didn't get the same error message as I had, and the Dialogflow response does contain an AudioBuffer, which is just not played in the browser. Then the problem might be due to your system setup.

If that's the case, I can give you a few more pointers:

  • When you run the SelfServiceKiosk and you record your voice; you should see the utterance written out. - If that works, it means the Speech To Text API worked fine. (Setup of the service account went fine too.)

  • It also means that you have to accept the browser popup to allow a microphone (at least once).

  • When Dialogflow detects the intent, (intent or fallback), it will create an AudioBuffer. (like you already have) This will be returned by the Text to Speech API. Once the browser receives the audio, you should see this in the Developer Tools:

    (index):58 (3) [{…}, null, null] 0: {alternativeQueryResults: Array(1)...

If you do see that object, but Chrome somehow doesn't play the audio; can you double check https://myaccount.google.com/activitycontrols Web App Activity & Voice/Audio should be enabled.

  • I am not sure if a firewall is blocking things/ports. The app should work well when running from http://localhost:8080 (or other specified port). When running from another URL or cloud, it can only run from HTTPS.

Hope this helps!

Thanks for your kind words!

Hmmm - I have to say that I haven't tested (the final solution) on my Windows machine. The audio recorder seems to work fine, the problem is that the socket.io server doesn't connect to your client. - If it all works fine, your server logs should show after starting:

Running server on port 8080
Client connected [id=vBaT3NTow2VsyUB4AAAA]

Can you see if the simple examples, in the example folder work for you?

It could be that this is related? Socket.io local.network not connecting .

Let me know if changing the firewall settings worked. - Then I will update it in the Github Readme.

Cheers, Lee

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM