I am struggling with getting transcription to work on my android application using IBM speech to text service. Below is the code for recording the files, and transcribing the files.
example code I took watson from github link
media recorder outputs
mediaRecorder = new MediaRecorder();
mediaRecorder.setMaxDuration(MAX_DURATION);
mediaRecorder.setAudioSource(MediaRecorder.AudioSource.MIC);
mediaRecorder.setOutputFormat(MediaRecorder.OutputFormat.MPEG_4);
mediaRecorder.setAudioEncoder(MediaRecorder.AudioEncoder.AMR_NB);
the file does have clear audio when I listen to it.
Watson code
private void startWatson() {
service = new SpeechToText();
String userName = String.valueOf(R.string.speech_text_username);
String password = String.valueOf(R.string.speech_text_password);
service.setUsernameAndPassword(userName, password);
service.setEndPoint(String.valueOf(R.string.speech_text_url));
}
I got the user name, password and URL from my bluemix account.
private void transcribe() throws IOException {
final InputStream inputStream = FileUtils.openInputStream(files[spnRecordingList.getSelectedItemPosition()]);
recognizeOptions = new RecognizeOptions.Builder().contentType(HttpMediaType.AUDIO_OGG).interimResults(true).build();
new Thread(new Runnable() {
@Override
public void run() {
try {
service.recognizeUsingWebSocket(inputStream, recognizeOptions, new playback());
} catch (Exception e) {
}
}
}).start();
}
I chose Audio_OGG because documentation says: audio/ogg (The service automatically detects the codec of the input audio.)
This could be wrong so if so please explain why because the examples I have found have not been that much help.
playback class
private class playback extends BaseRecognizeCallback {
@Override
public void onTranscription(SpeechResults speechResults) {
if (speechResults.getResults() != null && !speechResults.getResults().isEmpty()) {
String text = speechResults.getResults().get(0).getAlternatives().get(0).getTranscript();
txtbox.setText(text);
}
}
@Override
public void onError(Exception e) {
txtbox.setText("on error");
}
@Override
public void onDisconnected() {
txtbox.setText("on disconnected");
}
}
the playback class is a class inside my activity class
public class RecordingActivity extends AppCompatActivity implements
RecordingListFragment.OnFragmentInteractionListener {
"on create and etc code"
"start watson function"
"transcribe function"
playback class {}
}
I took the class and the thread code form the example I found on github with Watson speech to text.
I chose Audio_OGG because documentation says: audio/ogg (The service automatically detects the codec of the input audio.)
The service can automatically detect whether an ogg file contains vorbis or opus audio; but that won't work for mp4 input.
It doesn't look like the MediaRecorder supports ogg output, but you can try switching to webm by doing mediaRecorder.setOutputFormat(MediaRecorder.OutputFormat.WEBM);
and then using HttpMediaType.AUDIO_WEBM
in the RecognizeOptions
.
Watson also supports HttpMediaType.AUDIO_MPEG
, although I don't think that's the same as the MediaRecorder's MPEG_4
.
There are also several working examples at https://github.com/watson-developer-cloud/java-sdk/tree/develop/examples/src/main/java/com/ibm/watson/developer_cloud/speech_to_text/v1
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.