简体   繁体   English

将音频剪辑从iPhone流传输到服务器

[英]Streaming Audio Clips from iPhone to server

I'm wondering if there are any examples atomic examples out there for streaming audio FROM the iPhone to a server. 我想知道是否有示例示例,可以将音频从iPhone传输到服务器。 I'm not interested in telephony or SIP style solutions, just a simple socket stream to send an audio clip, in .wav format, as it is being recorded. 我对电话或SIP样式的解决方案不感兴趣,仅是一个简单的套接字流,可以在记录音频时以.wav格式发送音频剪辑。 I haven't had much luck with the google or other obvious avenues, although there seem to be many examples of doing this the other way around. 我对Google或其他明显的途径不太满意,尽管似乎有许多相反的方法。

i cant figure out how to register the unregistered account i initially posted with. 我不知道如何注册最初发布的未注册帐户。

anyway, I'm not really interested in the audio format at present, just the streaming aspect. 无论如何,我目前对音频格式并不十分感兴趣,而只是对流媒体方面。 i want to take the microphone input, and stream it from the iphone to a server. 我想接收麦克风输入,并将其从iPhone流式传输到服务器。 i dont presently care about the transfer rate as ill initially just test from a wifi connection, not the 3g setup. 我目前不关心传输速率,因为最初生病只是从wifi连接进行测试,而不是3g设置。 the reason i cant cache it is because im interested in trying out some open source speech recognition stuffs for my undergraduate thesis. 我之所以无法缓存它,是因为我有兴趣为我的本科论文尝试一些开源语音识别的东西。 caching and then sending the recording is possible but then it takes considerably longer to get the voice data to the server. 可以缓存然后发送录音,但是将语音数据发送到服务器需要花费更长的时间。 if i can start sending the data as soon as i start recording, then the response time is considerably improved because most of the data will have already reached the server by the time i let go of the record button. 如果我可以在开始记录后立即开始发送数据,那么响应时间将大大缩短,因为在放开“记录”按钮时,大多数数据将已经到达服务器。 furthermore, if i can get this streaming functionality to work from the iphone then on the server side of things i can also start the speech recognizer as soon as the first bit of audio comes through. 此外,如果我可以从iPhone上获取此流功能,则在服务器端,只要音频的第一部分通过,我也可以启动语音识别器。 again this should considerably speech up the final amount of time that the transaction takes from the user perspective. 再次,这应该从用户的角度充分说明交易花费的最终时间。

colin barrett mentions the phones and phone networks, but these are actually a pretty suboptimal solution for asr, mainly because they provide no good way to recover from errors - doing so over a voip dialogue is a horrible experience. colin barrett提到了电话和电话网络,但是对于asr来说,它们实际上是次优的解决方案,主要是因为它们没有提供从错误中恢复的好方法-通过voip对话这样做是一种可怕的体验。 however, the iphone and in particular the touch screen provide a great way to do that, through use of an ime or nbest lists for the other recognition candidates. 但是,通过使用ime或最佳列表来识别其他候选者,iphone(尤其是触摸屏)提供了一种绝佳的方法。

if i can figure out the basic architecture for streaming the audio, then i can start thinking about doing flac encoding or something to reduce the required transfer rate. 如果我能弄清楚流音频的基本体系结构,那么我就可以开始考虑进行flac编码或降低所需传输速率的操作。 maybe even feature extraction, although that limits the later ability to retrain the system with the recordings. 甚至可能进行特征提取,尽管这限制了以后用录音重新训练系统的能力。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM