简体繁体中英

Recording audio for speech recognition

原文 2017-12-28 20:46:12 2 1 linux/ tensorflow/ audio/ text-to-speech/ audio-recording

I've just started thinking about my new project which is about tts & stt(text-to-speech & speech-to-text) and I walked over some tricky problems that must be solved.

How do I record audio? I don't care about the language at this point, I'm just interested how I'll be able to record several chunks of audio that refer together. Let's suppose I would say "Jarvis Turn On Light Two", then this should be saved as "whatever.wav", but what if, I have said every word with a time spacing of 2 seconds. Then my recording software could assume that the first words "Jarvis Turn On" should be grouped and processed with TensorFlow and after that the next chunk of audio would be processed with the words "Light Two" which would make no sence at all. Are there any other approches how I can record meaningful audio? Maybe with a threshold so it only records when a certain amount of noise is given?
Which language should I use? The whole system should run as a background process on Linux. TensorFlow has also a wide range of supported languages. The once I care the most are C++ or Java. The main question here is how I can run the software in a continuous mode. So when my server is turned on, the recording software should also be launched and continue listening and generating my "whatever.wav" files.
Is threading and option or necessary? The recording software is running on Linux as a background process. It should just listen and group my spoken words into a single "whatever.wav" file. After it has updated this file, TensorFlow would scan the file and output whatever I've trained it. I'm not very familiar with infinite state machines so basically that's my question?

I'm very new to this topic so be patient with me.

Lg Michael

1 answers

How do I record audio? Are there any other approches how I can record meaningful audio? Maybe with a threshold so it only records when a certain amount of noise is given?

You record audio in small chunks of 0.1 second and process them one by one accumulating results. Once keyword is detected you perform action. There is no need to store the result into wav file, you can keep everything in memory. You can check for example existing software:

https://github.com/castorini/honk

Which language should I use? The whole system should run as a background process on Linux. TensorFlow has also a wide range of supported languages. The once I care the most are C++ or Java.

Most of TF development is done with Python

Is threading and option or necessary? The recording software is running on Linux as a background process.

Threading is not necessary. Linux kernel buffers audio internally while your software processes it.

Speech recognition streaming delay before start to recording on python

Speech recognition on docker image

Need text to speech and speech recognition tools for Linux

arecord audio recording command

Gstreamer recording video with audio

deployment of Linux program on Windows that needs audio recording

Audio recording using ALSA in PCM format

OBJECT NAVIGATORUSERMEDIAERROR appears on audio recording that runs on linux server

Audio recording using PortAudio: Pa_GetStreamReadAvailable not work?

C++ Recording Audio When a Certain Key Is Down Until it is Up

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Speech recognition streaming delay before start to recording on python Speech recognition on docker image Need text to speech and speech recognition tools for Linux arecord audio recording command Gstreamer recording video with audio deployment of Linux program on Windows that needs audio recording Audio recording using ALSA in PCM format OBJECT NAVIGATORUSERMEDIAERROR appears on audio recording that runs on linux server Audio recording using PortAudio: Pa_GetStreamReadAvailable not work? C++ Recording Audio When a Certain Key Is Down Until it is Up

Related Tags

Recording audio for speech recognition

Question

1 answers

solution1 0 2018-01-01 22:27:35

solution1
0 2018-01-01 22:27:35