如何優雅地結束 Google Speech-to-Text 流識別並取回待處理的文本結果？

Question

我希望能夠結束 Google 語音到文本流（使用streamingRecognize創建），並取回待處理的 SR（語音識別）結果。

簡而言之，相關的 Node.js 代碼：

// create SR stream
const stream = speechClient.streamingRecognize(request);

// observe data event
const dataPromise = new Promise(resolve => stream.on('data', resolve));

// observe error event
const errorPromise = new Promise((resolve, reject) => stream.on('error', reject));

// observe finish event
const finishPromise = new Promise(resolve => stream.on('finish', resolve));

// send the audio
stream.write(audioChunk);

// for testing purposes only, give the SR stream 2 seconds to absorb the audio
await new Promise(resolve => setTimeout(resolve, 2000));

// end the SR stream gracefully, by observing the completion callback
const endPromise = util.promisify(callback => stream.end(callback))();

// a 5 seconds test timeout
const timeoutPromise = new Promise(resolve => setTimeout(resolve, 5000)); 

// finishPromise wins the race here
await Promise.race([
  dataPromise, errorPromise, finishPromise, endPromise, timeoutPromise]);

// endPromise wins the race here
await Promise.race([
  dataPromise, errorPromise, endPromise, timeoutPromise]);

// timeoutPromise wins the race here
await Promise.race([dataPromise, errorPromise, timeoutPromise]);

// I don't see any data or error events, dataPromise and errorPromise don't get settled

我的經驗是 SR 流成功結束，但我沒有收到任何數據事件或錯誤事件。 dataPromise和errorPromise都沒有得到解決或拒絕。

如何發出音頻結束信號、關閉 SR 流並仍然獲得待處理的 SR 結果？

我需要堅持使用streamingRecognize API，因為我正在流式傳輸的音頻是實時的，即使它可能會突然停止。

澄清一下，只要我繼續流式傳輸音頻，它就可以工作，我確實會收到實時 SR 結果。 但是，當我發送最終的音頻塊並像上面一樣結束流時，我沒有得到我期望的最終結果。

為了得到最終的結果，我實際上必須保持流媒體靜音多幾秒鍾，這可能會增加 ST 賬單。 我覺得必須有更好的方法來獲取它們。

更新：看起來，結束streamingRecognize流的唯一適當時間是在StreamingRecognitionResult.is_final為true data事件上。 同樣，似乎我們希望在data事件被觸發之前繼續流式傳輸音頻，以獲得最終或臨時的任何結果。

這對我來說看起來像是一個錯誤，提交了一個問題。

更新：現在似乎已被確認為錯誤。 在修復之前，我正在尋找潛在的解決方法。

更新：為了將來參考，這里是當前和以前跟蹤的涉及streamingRecognize問題的列表。

我希望這對於使用streamingRecognize人來說是一個常見問題，很驚訝以前沒有報道過。 也將其作為錯誤提交到issuetracker.google.com 。

Answer 1

由於這是一個錯誤，我不知道這是否適合您，但我使用過 this.recognizeStream.end(); 在不同的情況下幾次，它奏效了。 但是，我的代碼有點不同......

此提要可能適合您： https : //groups.google.com/g/cloud-speech-discuss/c/lPaTGmEcZQk/m/Kl4fbHK2BQAJ

Answer 2

我的不好 - 不出所料，這在我的代碼中變成了一個模糊的競爭條件。

我已經整理了一個按預期工作的獨立樣本（ gist ）。 它幫助我追蹤了這個問題。 希望它可以幫助他人和我未來的自己：

// A simple streamingRecognize workflow,
// tested with Node v15.0.1, by @noseratio

import fs from 'fs';
import path from "path";
import url from 'url'; 
import util from "util";
import timers from 'timers/promises';
import speech from '@google-cloud/speech';

export {}

// need a 16-bit, 16KHz raw PCM audio 
const filename = path.join(path.dirname(url.fileURLToPath(import.meta.url)), "sample.raw");
const encoding = 'LINEAR16';
const sampleRateHertz = 16000;
const languageCode = 'en-US';

const request = {
  config: {
    encoding: encoding,
    sampleRateHertz: sampleRateHertz,
    languageCode: languageCode,
  },
  interimResults: false // If you want interim results, set this to true
};

// init SpeechClient
const client = new speech.v1p1beta1.SpeechClient();
await client.initialize();

// Stream the audio to the Google Cloud Speech API
const stream = client.streamingRecognize(request);

// log all data
stream.on('data', data => {
  const result = data.results[0];
  console.log(`SR results, final: ${result.isFinal}, text: ${result.alternatives[0].transcript}`);
});

// log all errors
stream.on('error', error => {
  console.warn(`SR error: ${error.message}`);
});

// observe data event
const dataPromise = new Promise(resolve => stream.once('data', resolve));

// observe error event
const errorPromise = new Promise((resolve, reject) => stream.once('error', reject));

// observe finish event
const finishPromise = new Promise(resolve => stream.once('finish', resolve));

// observe close event
const closePromise = new Promise(resolve => stream.once('close', resolve));

// we could just pipe it: 
// fs.createReadStream(filename).pipe(stream);
// but we want to simulate the web socket data

// read RAW audio as Buffer
const data = await fs.promises.readFile(filename, null);

// simulate multiple audio chunks
console.log("Writting...");
const chunkSize = 4096;
for (let i = 0; i < data.length; i += chunkSize) {
  stream.write(data.slice(i, i + chunkSize));
  await timers.setTimeout(50);
}
console.log("Done writing.");

console.log("Before ending...");
await util.promisify(c => stream.end(c))();
console.log("After ending.");

// race for events
await Promise.race([
  errorPromise.catch(() => console.log("error")), 
  dataPromise.then(() => console.log("data")),
  closePromise.then(() => console.log("close")),
  finishPromise.then(() => console.log("finish"))
]);

console.log("Destroying...");
stream.destroy();
console.log("Final timeout...");
await timers.setTimeout(1000);
console.log("Exiting.");

輸出：

Writting...
Done writing.
Before ending...
SR results, final: true, text:  this is a test I'm testing voice recognition This Is the End
After ending.
data
finish
Destroying...
Final timeout...
close
Exiting.

要對其進行測試，需要一個 16 位/16KHz 原始 PCM 音頻文件。 任意 WAV 文件無法按原樣工作，因為它包含帶有元數據的標頭。

Answer 3

這：“我正在尋找一種潛在的解決方法。 ” - 您是否考慮過從 SpeechClient 擴展為基類？ 我沒有要測試的憑據，但您可以使用自己的類從 SpeechClient 擴展，然后根據需要調用內部close()方法。 close()方法關閉 SpeechClient 並解析未完成的 Promise。

或者，您也可以代理SpeechClient() 並根據需要攔截/響應。 但由於您的意圖是將其關閉，因此以下選項可能是您的解決方法。

const speech = require('@google-cloud/speech');

class ClientProxy extends speech.SpeechClient {
  constructor() {
    super();
  }
  myCustomFunction() {
    this.close();
  }
}

const clientProxy = new ClientProxy();
try {
  clientProxy.myCustomFunction();
} catch (err) {
  console.log("myCustomFunction generated error: ", err);
}

如何優雅地結束 Google Speech-to-Text 流識別並取回待處理的文本結果？

問題描述

3 個解決方案

解決方案1
2 2020-11-02 09:38:36

解決方案2
2 已采納 2020-11-03 02:31:46

解決方案3
1 2020-11-02 01:55:26

如何優雅地結束 Google Speech-to-Text 流識別並取回待處理的文本結果？

問題描述

3 個解決方案

解決方案1 2 2020-11-02 09:38:36

解決方案2 2 已采納 2020-11-03 02:31:46

解決方案3 1 2020-11-02 01:55:26

解決方案1
2 2020-11-02 09:38:36

解決方案2
2 已采納 2020-11-03 02:31:46

解決方案3
1 2020-11-02 01:55:26