与 Twilio“流”动词和 Websocket 一起使用时，Google Cloud Speech to Text 音频超时错误

Question

I'm currently trying to make a system that can transcribe a phone call in real time and then display the conversation in my command line.我目前正在尝试制作一个可以实时转录电话的系统，然后在我的命令行中显示对话。 To do this, im using a Twilio phone number which sends out a http request when called.为此，我使用 Twilio 电话号码，该号码在被呼叫时会发出 http 请求。 Then using Flask, Ngrok and Websockets to compile my server code, make my local port public and to transfer the data, the TwiML verb "Stream" is used to stream the audio data to the Google Cloud Speech-Text API.然后使用 Flask、Ngrok 和 Websockets 编译我的服务器代码，使我的本地端口公开并传输数据，TwiML 动词“流”用于 stream 音频数据到 Google Cloud Speech-Text ZDB9744238D0143ACE16 I have so far used Twilio's python demo on GitHub ( https://github.com/twilio/media-streams/tree/master/python/realtime-transcriptions ).到目前为止，我在 GitHub 上使用了 Twilio 的 python 演示（ https://github.com/twilio/media-streams/tree/master/python/realtime-transcriptions ）

My server code:我的服务器代码：

from flask import Flask, render_template
from flask_sockets import Sockets

from SpeechClientBridge import SpeechClientBridge
from google.cloud.speech_v1 import enums
from google.cloud.speech_v1 import types

import json
import base64
import os

os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "./<KEY>.json"
HTTP_SERVER_PORT = 8080

config = types.RecognitionConfig(
    encoding=enums.RecognitionConfig.AudioEncoding.MULAW,
    sample_rate_hertz=8000,
    language_code='en-US')
streaming_config = types.StreamingRecognitionConfig(
    config=config,
    interim_results=True)

app = Flask(__name__)
sockets = Sockets(app)

@app.route('/home')
def home():
    return render_template("index.html")

@app.route('/twiml', methods=['POST'])
def return_twiml():
    print("POST TwiML")
    return render_template('streams.xml')

def on_transcription_response(response):
    if not response.results:
        return

    result = response.results[0]
    if not result.alternatives:
        return

    transcription = result.alternatives[0].transcript
    print("Transcription: " + transcription)

@sockets.route('/')
def transcript(ws):
    print("WS connection opened")
    bridge = SpeechClientBridge(
        streaming_config, 
        on_transcription_response
    )
    while not ws.closed:
        message = ws.receive()
        if message is None:
            bridge.terminate()
            break

        data = json.loads(message)
        if data["event"] in ("connected", "start"):
            print(f"Media WS: Received event '{data['event']}': {message}")
            continue
        if data["event"] == "media":
            media = data["media"]
            chunk = base64.b64decode(media["payload"])
            bridge.add_request(chunk)
        if data["event"] == "stop":
            print(f"Media WS: Received event 'stop': {message}")
            print("Stopping...")
            break

    bridge.terminate()
    print("WS connection closed")

if __name__ == '__main__':
    from gevent import pywsgi
    from geventwebsocket.handler import WebSocketHandler

    server = pywsgi.WSGIServer(('', HTTP_SERVER_PORT), app, handler_class=WebSocketHandler)
    print("Server listening on: http://localhost:" + str(HTTP_SERVER_PORT))
    server.serve_forever()

streams.xml:流。xml：

<?xml version="1.0" encoding="UTF-8"?>
<Response>
     <Say> Thanks for calling!</Say>
     <Start>
        <Stream url="wss://<ngrok-URL/.ngrok.io/"/>
     </Start>
     <Pause length="40"/>
</Response>

Twilio WebHook: Twilio WebHook：

http://<ngrok-URL>.ngrok.io/twiml

Im am getting the following error when I run the server code and then call the Twilio number:当我运行服务器代码然后调用 Twilio 号码时出现以下错误：

C:\Users\Max\Python\Twilio>python server.py
Server listening on: http://localhost:8080
POST TwiML
WS connection opened
Media WS: Received event 'connected': {"event":"connected","protocol":"Call","version":"0.2.0"}
Media WS: Received event 'start': {"event":"start","sequenceNumber":"1","start":{"accountSid":"AC8abc5aa74496a227d3eb489","streamSid":"MZe6245f23e2385aa2ea7b397","callSid":"CA5864313b4992607d3fe46","tracks":["inbound"],"mediaFormat":{"encoding":"audio/x-mulaw","sampleRate":8000,"channels":1}},"streamSid":"MZe6245f2397c1285aa2ea7b397"}
Exception in thread Thread-4:
Traceback (most recent call last):
  File "C:\Users\Max\AppData\Local\Programs\Python\Python37\lib\site-packages\google\api_core\grpc_helpers.py", line 96, in next
    return six.next(self._wrapped)
  File "C:\Users\Max\AppData\Local\Programs\Python\Python37\lib\site-packages\grpc\_channel.py", line 416, in __next__
    return self._next()
  File "C:\Users\Max\AppData\Local\Programs\Python\Python37\lib\site-packages\grpc\_channel.py", line 689, in _next
    raise self
grpc._channel._MultiThreadedRendezvous: <_MultiThreadedRendezvous of RPC that terminated with:
        status = StatusCode.OUT_OF_RANGE
        details = "Audio Timeout Error: Long duration elapsed without audio. Audio should be sent close to real time."
        debug_error_string = "{"created":"@1591738676.565000000","description":"Error received from peer ipv6:[2a00:1450:4009:807::200a]:443","file":"src/core/lib/surface/call.cc","file_line":1056,"grpc_message":"Audio Timeout Error: Long duration elapsed without audio. Audio should be sent close to real time.","grpc_status":11}"
>

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\Max\AppData\Local\Programs\Python\Python37\lib\threading.py", line 917, in _bootstrap_inner
    self.run()
  File "C:\Users\Max\AppData\Local\Programs\Python\Python37\lib\threading.py", line 865, in run
    self._target(*self._args, **self._kwargs)
  File "C:\Users\Max\Python\Twilio\SpeechClientBridge.py", line 37, in process_responses_loop
    for response in responses:
  File "C:\Users\Max\AppData\Local\Programs\Python\Python37\lib\site-packages\google\api_core\grpc_helpers.py", line 99, in next
    six.raise_from(exceptions.from_grpc_error(exc), exc)
  File "<string>", line 3, in raise_from
google.api_core.exceptions.OutOfRange: 400 Audio Timeout Error: Long duration elapsed without audio. Audio should be sent close to real time.

Media WS: Received event 'stop': {"event":"stop","sequenceNumber":"752","streamSid":"MZe6245f2397c125aa2ea7b397","stop":{"accountSid":"AC8abc5aa74496a60227d3eb489","callSid":"CA5842bc6431314d502607d3fe46"}}
Stopping...
WS connection closed

I cant work out why im getting the audio timeout error?我无法弄清楚为什么我会收到音频超时错误？ Is it a firewall issue with Twilio and Google? Twilio 和 Google 是否存在防火墙问题？ An encoding issue?编码问题？

Any help would be greatly appreciated.任何帮助将不胜感激。

System: Windows 10 Python 3.7.1 ngrok 2.3.35 Flask 1.1.2系统：Windows 10 Python 3.7.1 ngrok 2.3.35 Flask 1.1.2

Answer 1

As your streams.xml returned socket url "wss://<ngrok-URL/.ngrok.io/", please make sure it matches with your routing (eg @sockets.route('/'))由于您的流。xml 返回套接字 url "wss://<ngrok-URL/.ngrok.io/"，请确保它与您的路由匹配（例如 @sockets.route('/')）

If your socket starting with '/', then your should rewrite the streams.xml, see below as an example.如果你的套接字以'/'开头，那么你应该重写streams.xml，看下面的例子。

<?xml version="1.0" encoding="UTF-8"?>
<Response>
     <Say> Thanks for calling!</Say>
     <Start>
        <Stream url="wss://YOUR_NGROK_ID.ngrok.io/"/>
     </Start>
     <Pause length="40"/>
</Response>

Answer 2

I ran some tests on this to try to establish what was happening.我对此进行了一些测试，试图确定发生了什么。 I put a timer over the我在上面放了一个计时器

bridge = SpeechClientBridge( streaming_config, on_transcription_response)桥 = SpeechClientBridge（streaming_config，on_transcription_response）

section of code and found that it was taking ~10.9s to initialize.部分代码，发现初始化大约需要 10.9 秒。 I believe the google API has a timeout of 10s.我相信谷歌 API 的超时时间为 10 秒。 I tried running this on my google cloud instance which has more oomph than my laptop and it works perfectly well.我尝试在我的谷歌云实例上运行它，它比我的笔记本电脑更有魅力，而且效果很好。 Either this, or there are some different versions of libraries/code etc installed on the GCP instance, which I need to check.无论是这个，还是 GCP 实例上安装了一些不同版本的库/代码等，我需要检查一下。

Answer 3

This is related to gevent (used by flask_sockets ) and grpc (used by google cloud speech) conflict described in this issue https://github.com/grpc/grpc/issues/4629 the solution is to add the following code这与本期https://github.com/grpc/grpc/issues/4629中描述的gevent （ flask_sockets使用）和grpc （google 云语音使用）冲突有关，解决方案是添加以下代码

import grpc.experimental.gevent as grpc_gevent
grpc_gevent.init_gevent()

与 Twilio“流”动词和 Websocket 一起使用时，Google Cloud Speech to Text 音频超时错误

问题描述

3 个解决方案

解决方案1
0 2020-06-21 14:56:32

解决方案2
0 2020-06-23 17:44:53

解决方案3
0 2020-09-22 02:43:20

与 Twilio“流”动词和 Websocket 一起使用时，Google Cloud Speech to Text 音频超时错误

问题描述

3 个解决方案

解决方案1 0 2020-06-21 14:56:32

解决方案2 0 2020-06-23 17:44:53

解决方案3 0 2020-09-22 02:43:20

解决方案1
0 2020-06-21 14:56:32

解决方案2
0 2020-06-23 17:44:53

解决方案3
0 2020-09-22 02:43:20