define pronunciation starting time for each word in script

Question

I have a text script that is used to create podcasts. So the words in podcast audio are exactly the same as in my text. Now what I want to have is the following:

Word in text | Pronounciation started at
Hello          0:0:0.000
my             0:0:1.125
friends        0:0:2.750

Is that possible to do at all? Thanks in advance!

Answer 1

One of the key words you could start with to approach the complexity of the problem is "forced alignment". This site also covers questions regarding this topic eg here which leads you to questions and answers concerning HTK (the Hidden Markov Model Toolkit) via the releated threads.

You can find a more hands-on style description of how to use forced alignment in automated audio segmentation here .

So the answer is: yes, it is possible, but it is algorithmically very complex and even in its best implementations it is not error-free.

PS.: I found you a really simple tool

define pronunciation starting time for each word in script

Question

1 answers

solution1
1 ACCPTED 2014-06-28 12:42:15

define pronunciation starting time for each word in script

Question

1 answers

solution1 1 ACCPTED 2014-06-28 12:42:15

solution1
1 ACCPTED 2014-06-28 12:42:15