简体   繁体   中英

Analyse recorded audio file with Swift for speech to text

I'm able to record audio with Swift for iOS and play the recorded audio file. What I'm asking is if it is possible to check the recorded audio file for background noise & volume/decibel so I can decide that it is good enough for my speech to text framework. The framework is not the problem and I have researched all the available ones.

I'm curious if I can analyse the recorded audio file with AVFoundation or Accelerate Framework or any other framework to check if the audio file is good/clear enough to process with a speech to text framework.

I don't have a lot of audio knowledge but I've researched a bit and found out I can get the peak and average decibel values while recording but what about background noise?

Any information would be helpful about analysing a recorded audio file with Swift.

SNR estimation is pretty well developed domain. You need to implement a voice activity detector which will separate noise from speech and then separately compute noise energy and signal energy and then calculate the ratio. This goes slightly beyond simple math though, you need to understand statistics to implement a reasonable algorithm like Wada SNR which is implemented here .

You will not be able to find the implementation of it in Swift, such software is usually implemented in C or Matlab, you will have to port the implementation.

Noise estimation is a minor problem compared to speech recognition, that involves much more advanced algorithms. It is probably better to consider existing package for speech recognition in Swift like TLsphinx or OpenEars .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM