定义脚本中每个单词的发音开始时间

Question

I have a text script that is used to create podcasts. 我有一个用于创建播客的文本脚本。 So the words in podcast audio are exactly the same as in my text. 因此，播客音频中的单词与我的文字完全相同。 Now what I want to have is the following: 现在我想要的是以下内容：

Word in text | Pronounciation started at
Hello          0:0:0.000
my             0:0:1.125
friends        0:0:2.750

Is that possible to do at all? 那有可能做到吗？ Thanks in advance! 提前致谢！

Answer 1

One of the key words you could start with to approach the complexity of the problem is "forced alignment". 可以用来解决问题复杂性的关键词之一是“强制对齐”。 This site also covers questions regarding this topic eg here which leads you to questions and answers concerning HTK (the Hidden Markov Model Toolkit) via the releated threads. 该站点还涵盖有关此主题的问题，例如，此处通过相关线程引导您找到有关HTK（隐马尔可夫模型工具包）的问题和答案。

You can find a more hands-on style description of how to use forced alignment in automated audio segmentation here . 您可以在此处找到有关如何在自动音频分段中使用强制对齐的更多动手风格描述。

So the answer is: yes, it is possible, but it is algorithmically very complex and even in its best implementations it is not error-free. 因此答案是：是的，这是可能的，但是它在算法上非常复杂，即使在最佳实现中也不是没有错误的。

PS.: I found you a really simple tool PS .：我发现您是一个非常简单的工具

定义脚本中每个单词的发音开始时间

问题描述

1 个解决方案

解决方案1
1 已采纳 2014-06-28 12:42:15

定义脚本中每个单词的发音开始时间

问题描述

1 个解决方案

解决方案1 1 已采纳 2014-06-28 12:42:15

解决方案1
1 已采纳 2014-06-28 12:42:15