简体   繁体   中英

speech to text transcription issue with ibm watson

I am struggling with getting transcription to work on my android application using IBM speech to text service. Below is the code for recording the files, and transcribing the files.

example code I took watson from github link

media recorder outputs

mediaRecorder = new MediaRecorder();
mediaRecorder.setMaxDuration(MAX_DURATION);
mediaRecorder.setAudioSource(MediaRecorder.AudioSource.MIC);
mediaRecorder.setOutputFormat(MediaRecorder.OutputFormat.MPEG_4);
mediaRecorder.setAudioEncoder(MediaRecorder.AudioEncoder.AMR_NB);

the file does have clear audio when I listen to it.

Watson code

 private void startWatson() {

    service = new SpeechToText();
    String userName = String.valueOf(R.string.speech_text_username);
    String password = String.valueOf(R.string.speech_text_password);
    service.setUsernameAndPassword(userName, password);
    service.setEndPoint(String.valueOf(R.string.speech_text_url));
}

I got the user name, password and URL from my bluemix account.

 private void transcribe() throws IOException {

    final InputStream inputStream = FileUtils.openInputStream(files[spnRecordingList.getSelectedItemPosition()]);
    recognizeOptions = new RecognizeOptions.Builder().contentType(HttpMediaType.AUDIO_OGG).interimResults(true).build();

    new Thread(new Runnable() {
        @Override
        public void run() {
            try {
                service.recognizeUsingWebSocket(inputStream, recognizeOptions, new playback());
            } catch (Exception e) {

            }
        }
    }).start();

} 

I chose Audio_OGG because documentation says: audio/ogg (The service automatically detects the codec of the input audio.)

This could be wrong so if so please explain why because the examples I have found have not been that much help.

playback class

  private class playback extends BaseRecognizeCallback {

    @Override
    public void onTranscription(SpeechResults speechResults) {
        if (speechResults.getResults() != null && !speechResults.getResults().isEmpty()) {
            String text = speechResults.getResults().get(0).getAlternatives().get(0).getTranscript();
            txtbox.setText(text);
        }
    }

    @Override
    public void onError(Exception e) {
        txtbox.setText("on error");
    }

    @Override
    public void onDisconnected() {
        txtbox.setText("on disconnected");

    }
}

the playback class is a class inside my activity class

 public class RecordingActivity extends AppCompatActivity implements 
 RecordingListFragment.OnFragmentInteractionListener {

 "on  create and etc code"
 "start watson function"
 "transcribe function"
 playback class {}
 }

I took the class and the thread code form the example I found on github with Watson speech to text.

I chose Audio_OGG because documentation says: audio/ogg (The service automatically detects the codec of the input audio.)

The service can automatically detect whether an ogg file contains vorbis or opus audio; but that won't work for mp4 input.

It doesn't look like the MediaRecorder supports ogg output, but you can try switching to webm by doing mediaRecorder.setOutputFormat(MediaRecorder.OutputFormat.WEBM); and then using HttpMediaType.AUDIO_WEBM in the RecognizeOptions .

Watson also supports HttpMediaType.AUDIO_MPEG , although I don't think that's the same as the MediaRecorder's MPEG_4 .

There are also several working examples at https://github.com/watson-developer-cloud/java-sdk/tree/develop/examples/src/main/java/com/ibm/watson/developer_cloud/speech_to_text/v1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM