简体   繁体   中英

Unable to recognize text content from GCS uri using @google-cloud/speech

It's working when I load a file from the local buffer. But the response is null when I load the same file using GCS URI.

    const fileName = './audio.wav';

    // Reads a local audio file and converts it to base64
    const file = fs.readFileSync(fileName);
    const audioBytes = file.toString('base64');

    const audio = {
        uri: 'gs://bucket-name/path-to-audio/audio.wav'
        // content: audioBytes
    };
    const config = {
        audioChannelCount: 1,
        encoding: 'LINEAR16',
        sampleRateHertz: 16000,
        languageCode: 'ta-IN',
    };
    const request = {
        audio: audio,
        config: config,
    };

    // Detects speech in the audio file
    const [operation] = await client.longRunningRecognize(request);
    console.info('OPERATION STATUS', operation.name);

When I'm trying to load it using GCS URI I'm getting null as a response. Whereas, when I try to send the same file as a buffer I'm getting the proper response.

# from GCS
TRANSLATION STATUS true
OPERATION COMPLETE STATUS  3489419937829075659 null undefined

# from local file
TRANSLATION STATUS true
OPERATION COMPLETE STATUS  390578141483807025 வணக்கம் வணக்கம் வணக்கம் SpeechRecognitionResult {
  alternatives: [
    SpeechRecognitionAlternative {
      words: [],
      transcript: 'வணக்கம் வணக்கம் வணக்கம்',
      confidence: 0.8997038006782532
    }
  ]
}

When I do a console of operation name *3489419937829075659 *

STATUS DATA Operation {
  _events: [Object: null prototype] {
    newListener: [Function],
    removeListener: [Function]
  },
  _eventsCount: 2,
  _maxListeners: undefined,
  completeListeners: 0,
  hasActiveListeners: false,
  latestResponse: {
    name: '3489419937829075659',
    metadata: {
      type_url: 'type.googleapis.com/google.cloud.speech.v1.LongRunningRecognizeMetadata',
      value: <Buffer 08 64 12 0c 08 d7 9e b7 fa 05 10 90 d6 a3 a5 02 1a 0c 08 dc 9e b7 fa 05 10 e0 b7 f0 c8 02 22 40 67 73 3a 2f 2f 73 74 61 67 69 6e 67 2e 63 65 72 74 69 ... 46 more bytes>
    },
    done: true,
    response: {
      type_url: 'type.googleapis.com/google.cloud.speech.v1.LongRunningRecognizeResponse',
      value: <Buffer >
    },
    result: 'response'
  },
  name: '3489419937829075659',
  done: true,
  error: undefined,
  longrunningDescriptor: LongRunningDescriptor {
    operationsClient: OperationsClient {
      auth: [GoogleAuth],
      innerApiCalls: [Object],
      descriptor: [Object]
    },
    responseDecoder: [Function: bound decode_setup],
    metadataDecoder: [Function: bound decode_setup]
  },
  result: LongRunningRecognizeResponse { results: [] },
  metadata: LongRunningRecognizeMetadata {
    progressPercent: 100,
    startTime: Timestamp { seconds: [Long], nanos: 615050000 },
    lastUpdateTime: Timestamp { seconds: [Long], nanos: 689708000 }
  },
  backoffSettings: {
    initialRetryDelayMillis: 100,
    retryDelayMultiplier: 1.3,
    maxRetryDelayMillis: 60000,
    initialRpcTimeoutMillis: null,
    rpcTimeoutMultiplier: null,
    maxRpcTimeoutMillis: null,
    totalTimeoutMillis: null
  },
  response: {
    type_url: 'type.googleapis.com/google.cloud.speech.v1.LongRunningRecognizeResponse',
    value: <Buffer >
  },
  _callOptions: undefined,
  [Symbol(kCapture)]: false
}
STATUS DATA undefined

For the operation when I console out the whole object for operation I'm getting this,

STATUS DATA Operation {
  _events: [Object: null prototype] {
    newListener: [Function],
    removeListener: [Function]
  },
  _eventsCount: 2,
  _maxListeners: undefined,
  completeListeners: 0,
  hasActiveListeners: false,
  latestResponse: {
    name: '390578141483807025',
    metadata: {
      type_url: 'type.googleapis.com/google.cloud.speech.v1.LongRunningRecognizeMetadata',
      value: <Buffer 08 64 12 0c 08 f4 a3 b7 fa 05 10 88 e0 d4 ab 02 1a 0c 08 f7 a3 b7 fa 05 10 88 b0 ef d4 01>
    },
    done: true,
    response: {
      type_url: 'type.googleapis.com/google.cloud.speech.v1.LongRunningRecognizeResponse',
      value: <Buffer 12 4a 0a 48 0a 41 e0 ae b5 e0 ae a3 e0 ae 95 e0 af 8d e0 ae 95 e0 ae ae e0 af 8d 20 e0 ae b5 e0 ae a3 e0 ae 95 e0 af 8d e0 ae 95 e0 ae ae e0 af 8d 20 ... 26 more bytes>
    },
    result: 'response'
  },
  name: '390578141483807025',
  done: true,
  error: undefined,
  longrunningDescriptor: LongRunningDescriptor {
    operationsClient: OperationsClient {
      auth: [GoogleAuth],
      innerApiCalls: [Object],
      descriptor: [Object]
    },
    responseDecoder: [Function: bound decode_setup],
    metadataDecoder: [Function: bound decode_setup]
  },
  result: LongRunningRecognizeResponse {
    results: [ [SpeechRecognitionResult] ]
  },
  metadata: LongRunningRecognizeMetadata {
    progressPercent: 100,
    startTime: Timestamp { seconds: [Long], nanos: 628437000 },
    lastUpdateTime: Timestamp { seconds: [Long], nanos: 446421000 }
  },
  backoffSettings: {
    initialRetryDelayMillis: 100,
    retryDelayMultiplier: 1.3,
    maxRetryDelayMillis: 60000,
    initialRpcTimeoutMillis: null,
    rpcTimeoutMultiplier: null,
    maxRpcTimeoutMillis: null,
    totalTimeoutMillis: null
  },
  response: {
    type_url: 'type.googleapis.com/google.cloud.speech.v1.LongRunningRecognizeResponse',
    value: <Buffer 12 4a 0a 48 0a 41 e0 ae b5 e0 ae a3 e0 ae 95 e0 af 8d e0 ae 95 e0 ae ae e0 af 8d 20 e0 ae b5 e0 ae a3 e0 ae 95 e0 af 8d e0 ae 95 e0 ae ae e0 af 8d 20 ... 26 more bytes>
  },
  _callOptions: undefined,
  [Symbol(kCapture)]: false
}
STATUS DATA SpeechRecognitionResult {
  alternatives: [
    SpeechRecognitionAlternative {
      words: [],
      transcript: 'வணக்கம் வணக்கம் வணக்கம்',
      confidence: 0.8997038006782532
    }
  ]
}

this code works perfectly for both local and from GCS:

    async function main() {
      // Imports the Google Cloud client library
      const speech = require('@google-cloud/speech');
      const fs = require('fs');
    
      // Creates a client
      const client = new speech.SpeechClient();
    
      // The name of the audio file to transcribe
      const fileName = './audio.wav';
    
      // Reads a local audio file and converts it to base64
      const file = fs.readFileSync(fileName);
      const audioBytes = file.toString('base64');
    
      // The audio file's encoding, sample rate in hertz, and BCP-47 language code
      const audio = {
          uri: "gs://BUCKET_NAME/audio.wav"
        //content: audioBytes,
      };
      // these config could be different from one audio type to another
      const config = {
        audioChannelCount: 1,
        encoding: 'LINEAR16',
        sampleRateHertz: 8000,
        languageCode: 'en-US',
      };
      const request = {
        audio: audio,
        config: config,
      };
    
      // Detects speech in the audio file
      const [operation] = await client.longRunningRecognize(request);
      const [response] = await operation.promise();
      const transcription = response.results
        .map(result => result.alternatives[0].transcript)
        .join('\n');
      console.log(`Transcription: ${transcription}`);
    }
    
    main().catch(console.error);

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM