简体   繁体   中英

Amazon Alexa - capture full transcript

I am building an Alexa Skill using AWS Lambda and NodeJS. I have two questions:

1) Is it possible for me to retrieve the full transcript of the speaker?

In my Alexa phone app, I'm able to read exactly what I've spoken, but I'd like to collect this data so I can possibly analyze how people are speaking to my Skill.

This is possible with Speech-to-text tools like Google Speech APIs ( demo here , spec here ), with things like recognition.onresult() :

recognition.onresult = function(event) {
    var interim_transcript = '';

    for (var i = event.resultIndex; i < event.results.length; ++i) {
      if (event.results[i].isFinal) {
        final_transcript += event.results[i][0].transcript;

In my Alexa app, you can see here it captured when I asked "sing happy birthday":

在此输入图像描述

How can I programmatically capture this? I'd like to know when a user asks for things that I haven't thought of, collect these failures and common speech requests, and improve the skill based on it.


2) Does Alexa support multiple voices and multiple languages (input and output)?

Again, looking at Google Speech APIs, you can see it allows for many modifications to Speech input and Speech output, with multi-languages, and even speech rate:

    var utterance = new SpeechSynthesisUtterance();
    utterance.rate = 0.7;
    utterance.lang = "zh-CN";

Does Alexa offer this suite of controls?

Question 1:

Not currently. According to the request syntax , the audio clip is not provided to your service's endpoint. Alternatively, if you were providing the hardware, and leveraging the Alexa Voice Service, then you would be capturing the Audio.

Question 2:

Not currently. Alexa seems to only support English

To Capture Multiple Sentences :

Use this hack created by my colleague Bryan Colligan.

How it works

The hack uses slot type CONTENT_LIST with "value": "all" to capture any word. By creating sample utterances which include multiple capture all slots for example "{WordI} {WordII} {WordIII} {WordIV} {WordV} {WordVI} {WordVII} {WordVIII} {WordIX}" you can capture sentences of varying length with relative ease.

Note: In my experience Amazon's "Search Query" is limited to 5-6 words.

Warning: Amazon's transcriptions are pretty bad, so don't be surprised if what you capture is somewhat unreadable. This shortcoming is likely one reason Amazon does not reveal its transcripts. Google is much further ahead in Voice to Text. I'm sure in the future Amazon will release the transcripts when they feel more comfortable with their technology.

The code

The following code will concatenate multiple slots. It can be placed in your lambda function.

let querySentance = '';
let wordSlots = ["WordI", "WordII", "WordIII", "WordIV", "WordV", "WordVI", "WordVII", "WordVIII", "WordIX", "WordX", "WordXI", "WordXII", "WordXIII", "WordXIV", "WordXV", "WordXVI", "WordXVII", "WordXVIII", "WordIXX", "WordXX", "WordXXI", "WordXXII", "WordXXIII", "WordXXIV", "WordXXV", "WordXXVI", "WordXXVII", "WordXXVIII", "WordIXXX", "WordXXX",];
wordSlots.forEach((word)=>{
    let slot = this.event.request.intent.slots[word];
    if (slot !== undefined && slot.value !== '' && slot.value !== '?' && slot.value !== null && slot.value !== undefined){
        querySentance = querySentance+' '+slot.value;
    }
});

The following Interaction Model uses CONTENT_LIST and "value": "all" to capture any word.

{
    "interactionModel": {
        "languageModel": {
            "invocationName": "alpha voice",
            "intents": [
                {
                    "name": "AMAZON.CancelIntent",
                    "samples": [
                        "cancel"
                    ]
                },
                {
                    "name": "AMAZON.HelpIntent",
                    "samples": [
                        "help"
                    ]
                },
                {
                    "name": "AMAZON.StopIntent",
                    "samples": [
                        "stop"
                    ]
                },
                {
                    "name": "OzIntent",
                    "slots": [
                        {
                            "name": "Query",
                            "type": "AMAZONSearchQuery"
                        },
                        {
                            "name": "WordI",
                            "type": "CONTENT_LIST"
                        },
                        {
                            "name": "WordII",
                            "type": "CONTENT_LIST"
                        },
                        {
                            "name": "WordIII",
                            "type": "CONTENT_LIST"
                        },
                        {
                            "name": "WordIV",
                            "type": "CONTENT_LIST"
                        },
                        {
                            "name": "WordV",
                            "type": "CONTENT_LIST"
                        },
                        {
                            "name": "WordVI",
                            "type": "CONTENT_LIST"
                        },
                        {
                            "name": "WordVII",
                            "type": "CONTENT_LIST"
                        },
                        {
                            "name": "WordVIII",
                            "type": "CONTENT_LIST"
                        },
                        {
                            "name": "WordIX",
                            "type": "CONTENT_LIST"
                        },
                        {
                            "name": "WordX",
                            "type": "CONTENT_LIST"
                        },
                        {
                            "name": "WordXI",
                            "type": "CONTENT_LIST"
                        },
                        {
                            "name": "WordXII",
                            "type": "CONTENT_LIST"
                        },
                        {
                            "name": "WordXIII",
                            "type": "CONTENT_LIST"
                        },
                        {
                            "name": "WordXIV",
                            "type": "CONTENT_LIST"
                        },
                        {
                            "name": "WordXV",
                            "type": "CONTENT_LIST"
                        },
                        {
                            "name": "WordXVI",
                            "type": "CONTENT_LIST"
                        },
                        {
                            "name": "WordXVII",
                            "type": "CONTENT_LIST"
                        },
                        {
                            "name": "WordXVIII",
                            "type": "CONTENT_LIST"
                        },
                        {
                            "name": "WordIXX",
                            "type": "CONTENT_LIST"
                        },
                        {
                            "name": "WordXX",
                            "type": "CONTENT_LIST"
                        },
                        {
                            "name": "WordXXI",
                            "type": "CONTENT_LIST"
                        },
                        {
                            "name": "WordXXII",
                            "type": "CONTENT_LIST"
                        },
                        {
                            "name": "WordXXIII",
                            "type": "CONTENT_LIST"
                        },
                        {
                            "name": "WordXXIV",
                            "type": "CONTENT_LIST"
                        },
                        {
                            "name": "WordXXV",
                            "type": "CONTENT_LIST"
                        },
                        {
                            "name": "WordXXVI",
                            "type": "CONTENT_LIST"
                        },
                        {
                            "name": "WordXXVII",
                            "type": "CONTENT_LIST"
                        },
                        {
                            "name": "WordXXVIII",
                            "type": "CONTENT_LIST"
                        },
                        {
                            "name": "WordIXXX",
                            "type": "CONTENT_LIST"
                        },
                        {
                            "name": "WordXXX",
                            "type": "CONTENT_LIST"
                        }
                    ],
                    "samples": [
                        "{WordI}",
                        "{WordI} {WordII}",
                        "{WordI} {WordII} {WordIII}",
                        "{WordI} {WordII} {WordIII} {WordIV}",
                        "{WordI} {WordII} {WordIII} {WordIV} {WordV}",
                        "{WordI} {WordII} {WordIII} {WordIV} {WordV} {WordVI}",
                        "{WordI} {WordII} {WordIII} {WordIV} {WordV} {WordVI} {WordVII}",
                        "{WordI} {WordII} {WordIII} {WordIV} {WordV} {WordVI} {WordVII} {WordVIII}",
                        "{WordI} {WordII} {WordIII} {WordIV} {WordV} {WordVI} {WordVII} {WordVIII} {WordIX}",
                        "{WordI} {WordII} {WordIII} {WordIV} {WordV} {WordVI} {WordVII} {WordVIII} {WordIX} {WordX}",
                        "{WordI} {WordII} {WordIII} {WordIV} {WordV} {WordVI} {WordVII} {WordVIII} {WordIX} {WordX} {WordXI}",
                        "{WordI} {WordII} {WordIII} {WordIV} {WordV} {WordVI} {WordVII} {WordVIII} {WordIX} {WordX} {WordXI} {WordXII}",
                        "{WordI} {WordII} {WordIII} {WordIV} {WordV} {WordVI} {WordVII} {WordVIII} {WordIX} {WordX} {WordXI} {WordXII} {WordXIII}",
                        "{WordI} {WordII} {WordIII} {WordIV} {WordV} {WordVI} {WordVII} {WordVIII} {WordIX} {WordX} {WordXI} {WordXII} {WordXIII} {WordXIV}",
                        "{WordI} {WordII} {WordIII} {WordIV} {WordV} {WordVI} {WordVII} {WordVIII} {WordIX} {WordX} {WordXI} {WordXII} {WordXIII} {WordXIV} {WordXV}",
                        "{WordI} {WordII} {WordIII} {WordIV} {WordV} {WordVI} {WordVII} {WordVIII} {WordIX} {WordX} {WordXI} {WordXII} {WordXIII} {WordXIV} {WordXV} {WordXVI}",
                        "{WordI} {WordII} {WordIII} {WordIV} {WordV} {WordVI} {WordVII} {WordVIII} {WordIX} {WordX} {WordXI} {WordXII} {WordXIII} {WordXIV} {WordXV} {WordXVI} {WordXVII}",
                        "{WordI} {WordII} {WordIII} {WordIV} {WordV} {WordVI} {WordVII} {WordVIII} {WordIX} {WordX} {WordXI} {WordXII} {WordXIII} {WordXIV} {WordXV} {WordXVI} {WordXVII} {WordXVIII}",
                        "{WordI} {WordII} {WordIII} {WordIV} {WordV} {WordVI} {WordVII} {WordVIII} {WordIX} {WordX} {WordXI} {WordXII} {WordXIII} {WordXIV} {WordXV} {WordXVI} {WordXVII} {WordXVIII} {WordIXX}",
                        "{WordI} {WordII} {WordIII} {WordIV} {WordV} {WordVI} {WordVII} {WordVIII} {WordIX} {WordX} {WordXI} {WordXII} {WordXIII} {WordXIV} {WordXV} {WordXVI} {WordXVII} {WordXVIII} {WordIXX} {WordXX}",
                        "{WordI} {WordII} {WordIII} {WordIV} {WordV} {WordVI} {WordVII} {WordVIII} {WordIX} {WordX} {WordXI} {WordXII} {WordXIII} {WordXIV} {WordXV} {WordXVI} {WordXVII} {WordXVIII} {WordIXX} {WordXX} {WordXXI}",
                        "{WordI} {WordII} {WordIII} {WordIV} {WordV} {WordVI} {WordVII} {WordVIII} {WordIX} {WordX} {WordXI} {WordXII} {WordXIII} {WordXIV} {WordXV} {WordXVI} {WordXVII} {WordXVIII} {WordIXX} {WordXX} {WordXXI} {WordXXII}",
                        "{WordI} {WordII} {WordIII} {WordIV} {WordV} {WordVI} {WordVII} {WordVIII} {WordIX} {WordX} {WordXI} {WordXII} {WordXIII} {WordXIV} {WordXV} {WordXVI} {WordXVII} {WordXVIII} {WordIXX} {WordXX} {WordXXI} {WordXXII} {WordXXIII}",
                        "{WordI} {WordII} {WordIII} {WordIV} {WordV} {WordVI} {WordVII} {WordVIII} {WordIX} {WordX} {WordXI} {WordXII} {WordXIII} {WordXIV} {WordXV} {WordXVI} {WordXVII} {WordXVIII} {WordIXX} {WordXX} {WordXXI} {WordXXII} {WordXXIII} {WordXXIV}",
                        "{WordI} {WordII} {WordIII} {WordIV} {WordV} {WordVI} {WordVII} {WordVIII} {WordIX} {WordX} {WordXI} {WordXII} {WordXIII} {WordXIV} {WordXV} {WordXVI} {WordXVII} {WordXVIII} {WordIXX} {WordXX} {WordXXI} {WordXXII} {WordXXIII} {WordXXIV} {WordXXV}",
                        "{WordI} {WordII} {WordIII} {WordIV} {WordV} {WordVI} {WordVII} {WordVIII} {WordIX} {WordX} {WordXI} {WordXII} {WordXIII} {WordXIV} {WordXV} {WordXVI} {WordXVII} {WordXVIII} {WordIXX} {WordXX} {WordXXI} {WordXXII} {WordXXIII} {WordXXIV} {WordXXV} {WordXXVI}",
                        "{WordI} {WordII} {WordIII} {WordIV} {WordV} {WordVI} {WordVII} {WordVIII} {WordIX} {WordX} {WordXI} {WordXII} {WordXIII} {WordXIV} {WordXV} {WordXVI} {WordXVII} {WordXVIII} {WordIXX} {WordXX} {WordXXI} {WordXXII} {WordXXIII} {WordXXIV} {WordXXV} {WordXXVI} {WordXXVII}",
                        "{WordI} {WordII} {WordIII} {WordIV} {WordV} {WordVI} {WordVII} {WordVIII} {WordIX} {WordX} {WordXI} {WordXII} {WordXIII} {WordXIV} {WordXV} {WordXVI} {WordXVII} {WordXVIII} {WordIXX} {WordXX} {WordXXI} {WordXXII} {WordXXIII} {WordXXIV} {WordXXV} {WordXXVI} {WordXXVII} {WordXXVIII}",
                        "{WordI} {WordII} {WordIII} {WordIV} {WordV} {WordVI} {WordVII} {WordVIII} {WordIX} {WordX} {WordXI} {WordXII} {WordXIII} {WordXIV} {WordXV} {WordXVI} {WordXVII} {WordXVIII} {WordIXX} {WordXX} {WordXXI} {WordXXII} {WordXXIII} {WordXXIV} {WordXXV} {WordXXVI} {WordXXVII} {WordXXVIII} {WordIXXX}",
                        "{WordI} {WordII} {WordIII} {WordIV} {WordV} {WordVI} {WordVII} {WordVIII} {WordIX} {WordX} {WordXI} {WordXII} {WordXIII} {WordXIV} {WordXV} {WordXVI} {WordXVII} {WordXVIII} {WordIXX} {WordXX} {WordXXI} {WordXXII} {WordXXIII} {WordXXIV} {WordXXV} {WordXXVI} {WordXXVII} {WordXXVIII} {WordIXXX} {WordXXX}"
                    ]
                },
                {
                    "name": "AMAZON.NavigateHomeIntent",
                    "samples": [
                        "navigate home"
                    ]
                }
            ],
            "types": [
                {
                    "name": "AMAZONSearchQuery",
                    "values": [
                        {
                            "name": {
                                "value": "all"
                            }
                        }
                    ]
                },
                {
                    "name": "CONTENT_LIST",
                    "values": [
                        {
                            "name": {
                                "value": "all"
                            }
                        }
                    ]
                }
            ]
        }
    }
}

Note: I use this code as a capture all for my skill. It's the only intent. If you're looking to have other intents so that this intent can detect utterances that fall through I'd recommend experimenting. Create an intent with defined utterances and see if Amazon will pick it before falling back on this free form capture.

Please comment below if you have success and I'll update the answer.

An updated answer:

Q1 : Still not possible to get the audio. But you can use the Built-in Slot like AMAZON.SearchQuery to get values you haven't specified.

Q2 : Now you can use different voices in your skill by using the voice tag in SSML like this:

<voice name="Kendra"><lang xml:lang="en-US">I want to tell you a secret.</lang></voice><voice name="Brian"><lang xml:lang="en-GB">Your secret is safe with me!</lang></voice>

The following voices are supported for their respective languages:

English, American (en-US): Ivy, Joanna, Joey, Justin, Kendra, Kimberly, Matthew, Salli

English, Australian (en-AU): Nicole, Russell

English, British (en-GB): Amy, Brian, Emma

English, Indian (en-IN): Aditi, Raveena

German (de-DE): Hans, Marlene, Vicki

Spanish, Castilian (es-es): Conchita, Enrique

Italian (it-IT): Carla, Giorgio

Japanese (ja-JP): Mizuki, Takumi

French (fr-FR): Celine, Lea, Mathieu

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM