简体   繁体   English

Android语音识别和录音同时进行

[英]Android speech recognizing and audio recording in the same time

My application records audio using MediaRecorder class in AsyncTask and also use Google API transform speech to text - Recognizer Intent - using the code from this question : How can I use speech recognition without the annoying dialog in android phones 我的应用程序使用AsyncTask中的MediaRecorder类记录音频,并使用Google API转换语音到文本 - Recognizer Intent - 使用此问题的代码: 如何在没有Android手机烦人的对话框的情况下使用语音识别

I have tried also to record audio in Thread, but this is worse solution. 我也试过在Thread中录制音频,但这是更糟糕的解决方案。 It causes more problems. 它会导致更多问题。 My problem is that my application works properly on emulator. 我的问题是我的应用程序在模拟器上正常工作。 But emulator don't supports speech reocognition because of lack of voice recognition services. 但是由于缺乏语音识别服务,模拟器不支持语音重新识别。 And on my device my application has crash when I starts recording audio and speech reognizing - "has stopped unexpectedly". 在我的设备上,当我开始录制音频和语音识别时,我的应用程序崩溃了 - “已意外停止”。 However when I have wifi turned off, application works properly like on emulator. 但是当我关闭wifi时,应用程序就像在模拟器上一样正常工作。

Recording audio requires in AndroidManifest: 在AndroidManifest中录制音频需要:

<uses-permission android:name="android.permission.RECORD_AUDIO" />

and speech recognition requiers: 和语音识别要求:

<uses-permission android:name="android.permission.RECORD_AUDIO" />
<uses-permission android:name="android.permission.INTERNET" />

I suppose this is problem with single audio input? 我想这是单音频输入的问题? How can I resolve this problem? 我该如何解决这个问题? Google Speech Recognizer requiers to work in main UI thread, so I can't for example do it in Async Task. Google Speech Recognizer需要在主UI线程中工作,因此我无法在Async Task中执行此操作。 So I have audio recording in Async Task. 所以我在Async Task中录音。 I don't have idea why this causes problems. 我不知道为什么会导致问题。

I have connected my device to Eclipse and I have used USB debugging. 我已将设备连接到Eclipse,并且我已使用USB调试。 And this is execption I have in LogCat: 这是我在LogCat中的执行:

08-23 14:50:03.528: ERROR/ActivityThread(12403): Activity go.android.Activity has leaked ServiceConnection android.speech.SpeechRecognizer$Connection@48181340 that was originally bound here
08-23 14:50:03.528: ERROR/ActivityThread(12403): android.app.ServiceConnectionLeaked: Activity go.android.Activity has leaked ServiceConnection android.speech.SpeechRecognizer$Connection@48181340 that was originally bound here
08-23 14:50:03.528: ERROR/ActivityThread(12403):     at android.app.ActivityThread$PackageInfo$ServiceDispatcher.<init>(ActivityThread.java:1121)
08-23 14:50:03.528: ERROR/ActivityThread(12403):     at android.app.ActivityThread$PackageInfo.getServiceDispatcher(ActivityThread.java:1016)
08-23 14:50:03.528: ERROR/ActivityThread(12403):     at android.app.ContextImpl.bindService(ContextImpl.java:951)
08-23 14:50:03.528: ERROR/ActivityThread(12403):     at android.content.ContextWrapper.bindService(ContextWrapper.java:347)
08-23 14:50:03.528: ERROR/ActivityThread(12403):     at android.speech.SpeechRecognizer.startListening(SpeechRecognizer.java:267)
08-23 14:50:03.528: ERROR/ActivityThread(12403):     at go.android.Activity.startRecordingAndAnimation(Activity.java:285)
08-23 14:50:03.528: ERROR/ActivityThread(12403):     at go.android.Activity.onResume(Activity.java:86)
08-23 14:50:03.528: ERROR/ActivityThread(12403):     at android.app.Instrumentation.callActivityOnResume(Instrumentation.java:1151)
08-23 14:50:03.528: ERROR/ActivityThread(12403):     at android.app.Activity.performResume(Activity.java:3823)
08-23 14:50:03.528: ERROR/ActivityThread(12403):     at android.app.ActivityThread.performResumeActivity(ActivityThread.java:3118)
08-23 14:50:03.528: ERROR/ActivityThread(12403):     at android.app.ActivityThread.handleResumeActivity(ActivityThread.java:3143)
08-23 14:50:03.528: ERROR/ActivityThread(12403):     at android.app.ActivityThread.handleLaunchActivity(ActivityThread.java:2684)
08-23 14:50:03.528: ERROR/ActivityThread(12403):     at android.app.ActivityThread.access$2300(ActivityThread.java:125)
08-23 14:50:03.528: ERROR/ActivityThread(12403):     at android.app.ActivityThread$H.handleMessage(ActivityThread.java:2033)
08-23 14:50:03.528: ERROR/ActivityThread(12403):     at android.os.Handler.dispatchMessage(Handler.java:99)
08-23 14:50:03.528: ERROR/ActivityThread(12403):     at android.os.Looper.loop(Looper.java:123)
08-23 14:50:03.528: ERROR/ActivityThread(12403):     at android.app.ActivityThread.main(ActivityThread.java:4627)
08-23 14:50:03.528: ERROR/ActivityThread(12403):     at java.lang.reflect.Method.invokeNative(Native Method)
08-23 14:50:03.528: ERROR/ActivityThread(12403):     at java.lang.reflect.Method.invoke(Method.java:521)
08-23 14:50:03.528: ERROR/ActivityThread(12403):     at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:858)
08-23 14:50:03.528: ERROR/ActivityThread(12403):     at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:616)
08-23 14:50:03.528: ERROR/ActivityThread(12403):     at dalvik.system.NativeStart.main(Native Method)

And after that another exception: 之后是另一个例外:

08-23 14:50:08.000: ERROR/ServerConnectorImpl(12412): Failed to create session
08-23 14:50:08.000: ERROR/ServerConnectorImpl(12412): com.google.android.voicesearch.speechservice.ConnectionException: POST failed
08-23 14:50:08.000: ERROR/ServerConnectorImpl(12412):     at com.google.android.voicesearch.speechservice.SpeechServiceHttpClient.post(SpeechServiceHttpClient.java:176)
08-23 14:50:08.000: ERROR/ServerConnectorImpl(12412):     at com.google.android.voicesearch.speechservice.SpeechServiceHttpClient.post(SpeechServiceHttpClient.java:88)
08-23 14:50:08.000: ERROR/ServerConnectorImpl(12412):     at com.google.android.voicesearch.speechservice.ServerConnectorImpl.createTcpSession(ServerConnectorImpl.java:118)
08-23 14:50:08.000: ERROR/ServerConnectorImpl(12412):     at com.google.android.voicesearch.speechservice.ServerConnectorImpl.createSession(ServerConnectorImpl.java:98)
08-23 14:50:08.000: ERROR/ServerConnectorImpl(12412):     at com.google.android.voicesearch.speechservice.RecognitionController.runRecognitionMainLoop(RecognitionController.java:679)
08-23 14:50:08.000: ERROR/ServerConnectorImpl(12412):     at com.google.android.voicesearch.speechservice.RecognitionController.startRecognition(RecognitionController.java:463)
08-23 14:50:08.000: ERROR/ServerConnectorImpl(12412):     at com.google.android.voicesearch.speechservice.RecognitionController.access$200(RecognitionController.java:75)
08-23 14:50:08.000: ERROR/ServerConnectorImpl(12412):     at com.google.android.voicesearch.speechservice.RecognitionController$1.handleMessage(RecognitionController.java:300)
08-23 14:50:08.000: ERROR/ServerConnectorImpl(12412):     at android.os.Handler.dispatchMessage(Handler.java:99)
08-23 14:50:08.000: ERROR/ServerConnectorImpl(12412):     at android.os.Looper.loop(Looper.java:123)
08-23 14:50:08.000: ERROR/ServerConnectorImpl(12412):     at android.os.HandlerThread.run(HandlerThread.java:60)
08-23 14:50:08.000: ERROR/ServerConnectorImpl(12412): Caused by: java.net.SocketTimeoutException
08-23 14:50:08.000: ERROR/ServerConnectorImpl(12412):     at org.apache.harmony.luni.net.PlainSocketImpl.read(PlainSocketImpl.java:564)
08-23 14:50:08.000: ERROR/ServerConnectorImpl(12412):     at org.apache.harmony.luni.net.SocketInputStream.read(SocketInputStream.java:88)
08-23 14:50:08.000: ERROR/ServerConnectorImpl(12412):     at org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSessionInputBuffer.java:103)
08-23 14:50:08.000: ERROR/ServerConnectorImpl(12412):     at org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSessionInputBuffer.java:191)
08-23 14:50:08.000: ERROR/ServerConnectorImpl(12412):     at org.apache.http.impl.conn.DefaultResponseParser.parseHead(DefaultResponseParser.java:82)
08-23 14:50:08.000: ERROR/ServerConnectorImpl(12412):     at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:174)
08-23 14:50:08.000: ERROR/ServerConnectorImpl(12412):     at org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:179)
08-23 14:50:08.000: ERROR/ServerConnectorImpl(12412):     at org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:235)
08-23 14:50:08.000: ERROR/ServerConnectorImpl(12412):     at org.apache.http.impl.conn.AbstractClientConnAdapter.receiveResponseHeader(AbstractClientConnAdapter.java:259)
08-23 14:50:08.000: ERROR/ServerConnectorImpl(12412):     at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:279)
08-23 14:50:08.000: ERROR/ServerConnectorImpl(12412):     at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:121)
08-23 14:50:08.000: ERROR/ServerConnectorImpl(12412):     at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:410)
08-23 14:50:08.000: ERROR/ServerConnectorImpl(12412):     at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:555)
08-23 14:50:08.000: ERROR/ServerConnectorImpl(12412):     at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:487)
08-23 14:50:08.000: ERROR/ServerConnectorImpl(12412):     at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:465)
08-23 14:50:08.000: ERROR/ServerConnectorImpl(12412):     at android.net.http.AndroidHttpClient.execute(AndroidHttpClient.java:243)
08-23 14:50:08.000: ERROR/ServerConnectorImpl(12412):     at com.google.android.voicesearch.speechservice.SpeechServiceHttpClient.post(SpeechServiceHttpClient.java:167)
08-23 14:50:08.000: ERROR/ServerConnectorImpl(12412):     ... 10 more
08-23 14:50:08.000: ERROR/RecognitionController(12412): Ignoring error 2

I got a solution that is working well to have speech recognizing and audio recording. 我找到了一个能够很好地进行语音识别和录音的解决方案。 Here is the link to a simple Android project I created to show the solution's working. 以下是我创建的简单Android项目的链接 ,以显示解决方案的工作原理。 Also, I put some print screens inside the project to illustrate the app. 此外,我在项目中放置了一些打印屏幕来说明应用程序。

I'm gonna try to explain briefly the approach I used. 我将尝试简要解释我使用的方法。 I combined two features in that project: Google Speech API and Flac recording. 我在该项目中结合了两个功能:Google Speech API和Flac录制。

Google Speech API is called through HTTP connections. Google Speech API通过HTTP连接调用。 Mike Pultz gives more details about the API: Mike Pultz提供了有关API的更多详细信息:

" (...) the new [Google] API is a full-duplex streaming API. What this means, is that it actually uses two HTTP connections- one POST request to upload the content as a “live” chunked stream, and a second GET request to access the results, which makes much more sense for longer audio samples, or for streaming audio." (...)新的[Google] API是一个全双工流API。这意味着,它实际上使用了两个HTTP连接 - 一个POST请求将内容上传为”实时“分块流,以及第二个GET请求访问结果,这对于更长的音频样本或流式音频更有意义。“

However, this API needs to receive a FLAC sound file to work properly. 但是,此API需要接收FLAC声音文件才能正常工作。 That makes us to go to the second part: Flac recording 这让我们进入第二部分:Flac录音

I implemented Flac recording in that project through extracting and adapting some pieces of code and libraries from an open source app called AudioBoo. 我通过从名为AudioBoo的开源应用程序中提取和调整一些代码和库来实现该项目中的Flac记录。 AudioBoo uses native code to record and play flac format. AudioBoo使用本机代码来记录和播放flac格式。

Thus, it's possible to record a flac sound, send it to Google Speech API, get the text, and play the sound that was just recorded. 因此,可以录制flac声音,将其发送到Google Speech API,获取文本,然后播放刚录制的声音。

The project I created has the basic principles to make it work and can be improved for specific situations. 我创建的项目具有使其工作的基本原则,并且可以针对特定情况进行改进。 In order to make it work in a different scenario, it's necessary to get a Google Speech API key, which is obtained by being part of Google Chromium-dev group. 为了使其在不同的场景中工作,有必要获得一个Google Speech API密钥,该密钥是通过成为Google Chromium-dev组的一部分获得的。 I left one key in that project just to show it's working, but I'll remove it eventually. 我在该项目中留下了一个密钥,只是为了表明它正在工作,但我最终会删除它。 If someone needs more information about it, let me know cause I'm not able to put more than 2 links in this post. 如果有人需要更多相关信息,请告诉我原因是我无法在这篇文章中添加2个以上的链接。

Late Answer, but for the first Exception, You have to destroy Your SpeechRecognizer after this what You want has done, for example (in onStop() or onDestroy() or directly after You don´t need the SpeechRecognizer anymore): 延迟回答,但是对于第一个异常,你必须在你想要的之后销毁你的SpeechRecognizer,例如(在onStop()或onDestroy()中或在你不再需要SpeechRecognizer之后直接销毁):

    if (YourSpeechRecognizer != null)
    {
        YourSpeechRecognizer.stopListening();
        YourSpeechRecognizer.cancel();
        YourSpeechRecognizer.destroy();
    }

I have successfully accomplished this with the help of CLOUD SPEECH API . 我已经在CLOUD SPEECH API的帮助下成功完成了这项工作。 You can find it's demo by google speech . 你可以通过谷歌演讲找到它的演示。

The API recognizes over 80 languages and variants, to support your global user base. API可识别80多种语言和变体,以支持您的全球用户群。 You can transcribe the text of users dictating to an application's microphone, enable command-and-control through voice, or transcribe audio files, among many other use cases. 您可以将用户的文本转录为应用程序的麦克风,通过语音启用命令和控制,或转录音频文件,以及许多其他用例。 Recognize audio uploaded in the request, and integrate with your audio storage on Google Cloud Storage, by using the same technology Google uses to power its own products. 通过使用Google用于为其自己的产品供电的相同技术,识别请求中上传的音频,并与Google云端存储上的音频存储集成。

It uses audio buffer to transcribe data with help of Google Speech API. 它使用音频缓冲区在Google Speech API的帮助下转录数据。 I have used this buffer to store Audio recording with help of AudioRecorder . 我在AudioRecorder的帮助下使用此缓冲区存储音频录制。

So with this demo we can transcribe user's speech parallely with Audio Recording. 因此,通过此演示,我们可以与音频录制并行转录用户的演讲。

In this, it starts and stops speech recognition based on voice. 在此,它基于语音启动和停止语音识别。 It also gives a facility of SPEECH_TIMEOUT_MILLIS in VoiceRecorder.java which is just same as EXTRA_SPEECH_INPUT_COMPLETE_SILENCE_LENGTH_MILLIS of RecognizerIntent , but user controlled. 这也给了SPEECH_TIMEOUT_MILLIS的设施VoiceRecorder.java这只是相同EXTRA_SPEECH_INPUT_COMPLETE_SILENCE_LENGTH_MILLISRecognizerIntent ,但用户控制。

So all in all, you can specify silence timeout and based on that it will stop after user output and start again as soon as user starts speaking. 总而言之,你可以指定静音超时,并根据它在用户输出后停止,并在用户开始说话后立即重新开始。

Recent projects on ' google-speech ' and on ' android-opus ' (opuslib) allow simple, concurrent recognition along with audio record to an opus file in android ext. 最近关于' google-speech '和' android-opus '(opuslib)的项目允许简单的并发识别以及音频记录到android ext中的opus文件。 storage. 存储。

Looking at the VoiceRecorder in the speech project , with only a few extra lines of code after reading the microphone buffer, the buffer can also be consumed by a fileSink (PCM16 to Opus-codec) in addition to the current speech-observer. 望着VoiceRecorder在讲话项目,代码只有几行额外读麦克风缓冲区后,缓冲区也可以通过一个文件接收(PCM16天主事编解码器),除了当前语音观察员消耗。

see minimal merge of the 2 projects above in Google-speech-opus-recorder Google-speech-opus-recorder中看到上面两个项目的最小合并

I haven't tested this solution yet but maybe there is a possibility. 我还没有测试过这个解决方案,但也许有可能。 In http://developer.android.com/reference/android/speech/RecognitionService.Callback.html there is method void bufferReceived(byte[] buffer) . http://developer.android.com/reference/android/speech/RecognitionService.Callback.html中有方法void bufferReceived(byte[] buffer) The possible solution is to saving this recived buffer in AudioRecord Android class. 可能的解决方案是在AudioRecord Android类中保存此已恢复缓冲区。 It has method like read(byte[] audioData, int offsetInBytes, int sizeInBytes) . 它有read(byte[] audioData, int offsetInBytes, int sizeInBytes) So maybe it is possible to connect this two utilities in this way? 那么也许有可能以这种方式连接这两个实用程序? Problems might have occurred with configuring AudioRecord and with converting the result to mp3 or wav format after recording. 配置AudioRecord并在录制后将结果转换为mp3或wav格式时可能会出现问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM