简体   繁体   English

如何通过 websocket 将音频发送到 Nexmo Voice

[英]How can I send audio to Nexmo Voice through websocket

I am trying to implement Nexmo's Voice api, with websockets, in a .Net Core 2 web api.我正在尝试在 .Net Core 2 web api 中使用 websockets 实现 Nexmo 的 Voice api。

This api needs to : 这个 api 需要:
  • receive audio from phone call, through Nexmo通过Nexmo从电话中接收音频
  • use Microsoft Cognitive Speech to text api使用 Microsoft Cognitive Speech 文本api
  • send the text to a bot将文本发送到机器人
  • use Microsoft Cognitive text to speech on the reply of the bot使用 Microsoft 认知文本对机器人的回复进行语音
  • send back the speech to nexmo, through their voice api websocket通过他们的语音 api websocket 将语音发回给 nexmo

For now, I'm bypassing the bot steps, as I am first trying to connect to the websocket.现在,我正在绕过机器人步骤,因为我首先尝试连接到 websocket。 When trying an echo method (send back to the websocket the audio received), it works without any issue.尝试回声方法(将收到的音频发送回 websocket)时,它可以正常工作。 But when I try to send the speech from Microsoft text to speech, the phone call ends.但是当我尝试将语音从 Microsoft 文本发送到语音时,电话结束了。

I am not finding any documentation implementing something different than just an echo.我没有找到任何实现与回声不同的文档。

The TextToSpeech and SpeechToText methods work as expected when used outside of the websocket. TextToSpeech 和 SpeechToText 方法在 websocket 之外使用时按预期工作。

Here's the websocket with the speech-to-text :这是带有语音到文本的 websocket:

public static async Task Echo(HttpContext context, WebSocket webSocket)
    {
        var buffer = new byte[1024 * 4];
        WebSocketReceiveResult result = await webSocket.ReceiveAsync(new ArraySegment<byte>(buffer), CancellationToken.None);
        while (!result.CloseStatus.HasValue)
        {
            var ttsAudio = await TextToSpeech.TransformTextToSpeechAsync("Hello, this is a test", "en-US");
            await webSocket.SendAsync(new ArraySegment<byte>(ttsAudio, 0, ttsAudio.Length), WebSocketMessageType.Binary, true, CancellationToken.None);

            result = await webSocket.ReceiveAsync(new ArraySegment<byte>(buffer), CancellationToken.None);
        }
        await webSocket.CloseAsync(result.CloseStatus.Value, result.CloseStatusDescription, CancellationToken.None);
    }

And here's the websocket with the text-to-speech :这是带有 text-to-speech 的 websocket:

 public static async Task Echo(HttpContext context, WebSocket webSocket) { var buffer = new byte[1024 * 4]; WebSocketReceiveResult result = await webSocket.ReceiveAsync(new ArraySegment<byte>(buffer), CancellationToken.None); while (!result.CloseStatus.HasValue) { var ttsAudio = await TextToSpeech.TransformTextToSpeechAsync("Hello, this is a test", "en-US"); await webSocket.SendAsync(new ArraySegment<byte>(ttsAudio, 0, ttsAudio.Length), WebSocketMessageType.Binary, true, CancellationToken.None); result = await webSocket.ReceiveAsync(new ArraySegment<byte>(buffer), CancellationToken.None); } await webSocket.CloseAsync(result.CloseStatus.Value, result.CloseStatusDescription, CancellationToken.None); }

Update March 1st 2019 2019 年 3 月 1 日更新

in reply to Sam Machin 's comment I tried splitting the array into chunks of 640 bytes each (I'm using 16000khz sample rate), but nexmo still hangs up the call, and I still don't hear anything.在回复Sam Machin的评论时,我尝试将数组拆分为每个 640 字节的块(我使用的是 16000khz 采样率),但是 nexmo 仍然挂断了电话,我仍然没有听到任何声音。

 public static async Task NexmoTextToSpeech(HttpContext context, WebSocket webSocket) { var ttsAudio = await TextToSpeech.TransformTextToSpeechAsync("This is a test", "en-US"); var buffer = new byte[1024 * 4]; WebSocketReceiveResult result = await webSocket.ReceiveAsync(new ArraySegment<byte>(buffer), CancellationToken.None); while (!result.CloseStatus.HasValue) { await SendSpeech(context, webSocket, ttsAudio); result = await webSocket.ReceiveAsync(new ArraySegment<byte>(buffer), CancellationToken.None); } await webSocket.CloseAsync(WebSocketCloseStatus.NormalClosure, "Closing Socket", CancellationToken.None); } private static async Task SendSpeech(HttpContext context, WebSocket webSocket, byte[] ttsAudio) { const int chunkSize = 640; var chunkCount = 1; var offset = 0; var lastFullChunck = ttsAudio.Length < (offset + chunkSize); try { while(!lastFullChunck) { await webSocket.SendAsync(new ArraySegment<byte>(ttsAudio, offset, chunkSize), WebSocketMessageType.Binary, false, CancellationToken.None); offset = chunkSize * chunkCount; lastFullChunck = ttsAudio.Length < (offset + chunkSize); chunkCount++; } var lastMessageSize = ttsAudio.Length - offset; await webSocket.SendAsync(new ArraySegment<byte>(ttsAudio, offset, lastMessageSize), WebSocketMessageType.Binary, true, CancellationToken.None); } catch (Exception ex) { } }

Here's the exception that sometimes appears in the logs :这是有时出现在日志中的异常:

System.Net.WebSockets.WebSocketException (0x80004005): The remote party closed the WebSocket connection without completing the close handshake. System.Net.WebSockets.WebSocketException (0x80004005): 远程方在没有完成关闭握手的情况下关闭了 WebSocket 连接。

It looks like you're writing the whole audio clip to the websocket, the Nexmo interface requires the audio to be in 20ms frames one per message, this means that you need to break your clip up into 320 or 640 byte (depending on if you're using 8Khz or 16Khz) chunks and write each one to the socket.看起来您正在将整个音频剪辑写入 websocket,Nexmo 接口要求音频在每条消息的 20 毫秒帧内,这意味着您需要将剪辑分成 320 或 640 字节(取决于您'正在使用 8Khz 或 16Khz)块并将每个块写入套接字。 If you try and write too larger file to the socket it will close as you are seeing.如果您尝试将太大的文件写入套接字,它将如您所见地关闭。

See https://developer.nexmo.com/voice/voice-api/guides/websockets#writing-audio-to-the-websocket for the details.有关详细信息,请参阅https://developer.nexmo.com/voice/voice-api/guides/websockets#writing-audio-to-the-websocket

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM