简体   繁体   中英

Recording audio for speech recognition

I've just started thinking about my new project which is about tts & stt(text-to-speech & speech-to-text) and I walked over some tricky problems that must be solved.

  1. How do I record audio? I don't care about the language at this point, I'm just interested how I'll be able to record several chunks of audio that refer together. Let's suppose I would say "Jarvis Turn On Light Two", then this should be saved as "whatever.wav", but what if, I have said every word with a time spacing of 2 seconds. Then my recording software could assume that the first words "Jarvis Turn On" should be grouped and processed with TensorFlow and after that the next chunk of audio would be processed with the words "Light Two" which would make no sence at all. Are there any other approches how I can record meaningful audio? Maybe with a threshold so it only records when a certain amount of noise is given?
  2. Which language should I use? The whole system should run as a background process on Linux. TensorFlow has also a wide range of supported languages. The once I care the most are C++ or Java. The main question here is how I can run the software in a continuous mode. So when my server is turned on, the recording software should also be launched and continue listening and generating my "whatever.wav" files.
  3. Is threading and option or necessary? The recording software is running on Linux as a background process. It should just listen and group my spoken words into a single "whatever.wav" file. After it has updated this file, TensorFlow would scan the file and output whatever I've trained it. I'm not very familiar with infinite state machines so basically that's my question?

I'm very new to this topic so be patient with me.

Lg Michael

How do I record audio? Are there any other approches how I can record meaningful audio? Maybe with a threshold so it only records when a certain amount of noise is given?

You record audio in small chunks of 0.1 second and process them one by one accumulating results. Once keyword is detected you perform action. There is no need to store the result into wav file, you can keep everything in memory. You can check for example existing software:

https://github.com/castorini/honk

Which language should I use? The whole system should run as a background process on Linux. TensorFlow has also a wide range of supported languages. The once I care the most are C++ or Java.

Most of TF development is done with Python

Is threading and option or necessary? The recording software is running on Linux as a background process.

Threading is not necessary. Linux kernel buffers audio internally while your software processes it.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM