如何从 Watson Speech-to-Text output 重建对话？

Question

I have the JSON output from Watson's Speech-to-Text service that I have converted into a list and then into a Pandas data-frame.我有来自 Watson 的 Speech-to-Text 服务的 JSON output，我已将其转换为列表，然后转换为 Pandas 数据帧。

I'm trying to identify how to reconstruct the conversation (with timings) akin to the following:我正在尝试确定如何重建对话（带有时间），类似于以下内容：

Speaker 0: Said this [00.01 - 00.12]演讲者 0：说过这个 [00.01 - 00.12]

Speaker 1: Said that [00.12 - 00.22]演讲者 1：说过 [00.12 - 00.22]

Speaker 0: Said something else [00.22 - 00.56]演讲者 0：说了点别的 [00.22 - 00.56]

My data-frame has a row for each word, and columns for the word, its start/end time, and the speaker tag (either 0 or 1).我的数据框每个单词都有一行，单词的列、开始/结束时间和说话者标签（0 或 1）。

words = [['said', 0.01, 0.06, 0],['this', 0.06, 0.12, 0],['said', 0.12, 
0.15, 1],['that', 0.15, 0.22, 1],['said', 0.22, 0.31, 0],['something', 
0.31, 0.45, 0],['else', 0.45, 0.56, 0]]

Ideally, what I am looking to create is the following, where words spoken by the same speaker are grouped together, and are broken when the next speaker steps in:理想情况下，我要创建的是以下内容，其中同一说话者所说的单词被组合在一起，并在下一个说话者介入时被打破：

grouped_words = [[['said','this'], 0.01, 0.12, 0],[['said','that'] 0.12, 
0.22, 1],[['said','something','else'] 0.22, 0.56, 0]

UPDATE: As per request, a link to a sample of the JSON file obtained is at https://github.com/cookie1986/STT_test更新：根据请求，获得的 JSON 文件示例的链接位于https://github.com/cookie1986/STT_test

Answer 1

Should be pretty straightforward to load the speaker labels into a Pandas Dataframe for a nice easy graphical view and then identifying the speaker shifts.将扬声器标签加载到 Pandas Dataframe 中应该非常简单，以获得漂亮的简单图形视图，然后识别扬声器移位。

speakers=pd.DataFrame(jsonconvo['speaker_labels']).loc[:,['from','speaker','to']]
convo=pd.DataFrame(jsonconvo['results'][0]['alternatives'][0]['timestamps'])
speakers=speakers.join(convo)

Output: Output：

   from  speaker    to          0     1     2
0  0.01        0  0.06       said  0.01  0.06
1  0.06        0  0.12       this  0.06  0.12
2  0.12        1  0.15       said  0.12  0.15
3  0.15        1  0.22       that  0.15  0.22
4  0.22        0  0.31       said  0.22  0.31
5  0.31        0  0.45  something  0.31  0.45
6  0.45        0  0.56       else  0.45  0.56

From there, you can ID only speaker shifts and collapse the dataframe with a quick loop从那里，您可以仅识别扬声器移位并通过快速循环折叠 dataframe

ChangeSpeaker=speakers.loc[speakers['speaker'].shift().=speakers['speaker']].index

Transcript=pd.DataFrame(columns=['from','to','speaker','transcript'])
for counter in range(0,len(ChangeSpeaker)):
    print(counter)
    currentindex=ChangeSpeaker[counter]
    try:
        nextIndex=ChangeSpeaker[counter+1]-1
        temp=speakers.loc[currentindex:nextIndex,:]
    except:
        temp=speakers.loc[currentindex:,:]
Transcript=Transcript.append(pd.DataFrame([[temp.head(1)['from'].values[0],temp.tail(1)['to'].values[0],temp.head(1)['speaker'].values[0],temp[0].tolist()]],columns=['from','to','speaker','transcript']))

You want to take the start point from the first value (hence head) and then the end point from the last vlaue in the temporary dataframe.您想从临时 dataframe 中的第一个值（因此为头）获取起点，然后从最后一个值获取终点。 Additionally, to handle the last speaker case (where you 'd normally get an array out of bounds error, you use a try/catch.此外，要处理最后一个扬声器案例（通常会出现数组越界错误，您可以使用 try/catch.

Output: Output：

   from    to speaker               transcript
0  0.01  0.12       0             [said, this]
0  0.12  0.22       1             [said, that]
0  0.22  0.56       0  [said, something, else]

Full Code Here完整代码在这里

import json
import pandas as pd

jsonconvo=json.loads("""{
   "results": [
      {
         "alternatives": [
            {
               "timestamps": [
                  [
                     "said", 
                     0.01, 
                     0.06
                  ], 
                  [
                     "this", 
                     0.06, 
                     0.12
                  ], 
                  [
                     "said", 
                     0.12, 
                     0.15
                  ], 
                  [
                     "that", 
                     0.15, 
                     0.22
                  ], 
                  [
                     "said", 
                     0.22, 
                     0.31
                  ], 
                  [
                     "something", 
                     0.31, 
                     0.45
                  ], 
                  [
                     "else", 
                     0.45, 
                     0.56
                  ]
               ], 
               "confidence": 0.85, 
               "transcript": "said this said that said something else "
            }
         ], 
         "final": true
      }
   ], 
   "result_index": 0, 
   "speaker_labels": [
      {
         "from": 0.01, 
         "to": 0.06, 
         "speaker": 0, 
         "confidence": 0.55, 
         "final": false
      }, 
      {
         "from": 0.06, 
         "to": 0.12, 
         "speaker": 0, 
         "confidence": 0.55, 
         "final": false
      }, 
      {
         "from": 0.12, 
         "to": 0.15, 
         "speaker": 1, 
         "confidence": 0.55, 
         "final": false
      }, 
      {
         "from": 0.15, 
         "to": 0.22, 
         "speaker": 1, 
         "confidence": 0.55, 
         "final": false
      }, 
      {
         "from": 0.22, 
         "to": 0.31, 
         "speaker": 0, 
         "confidence": 0.55, 
         "final": false
      }, 
      {
         "from": 0.31, 
         "to": 0.45, 
         "speaker": 0, 
         "confidence": 0.55, 
         "final": false
      }, 
      {
         "from": 0.45, 
         "to": 0.56, 
         "speaker": 0, 
         "confidence": 0.54, 
         "final": false
      }
   ]
}""")



speakers=pd.DataFrame(jsonconvo['speaker_labels']).loc[:,['from','speaker','to']]
convo=pd.DataFrame(jsonconvo['results'][0]['alternatives'][0]['timestamps'])
speakers=speakers.join(convo)

ChangeSpeaker=speakers.loc[speakers['speaker'].shift()!=speakers['speaker']].index


Transcript=pd.DataFrame(columns=['from','to','speaker','transcript'])
for counter in range(0,len(ChangeSpeaker)):
    print(counter)
    currentindex=ChangeSpeaker[counter]
    try:
        nextIndex=ChangeSpeaker[counter+1]-1
        temp=speakers.loc[currentindex:nextIndex,:]
    except:
        temp=speakers.loc[currentindex:,:]



    Transcript=Transcript.append(pd.DataFrame([[temp.head(1)['from'].values[0],temp.tail(1)['to'].values[0],temp.head(1)['speaker'].values[0],temp[0].tolist()]],columns=['from','to','speaker','transcript']))

如何从 Watson Speech-to-Text output 重建对话？

问题描述

1 个解决方案

解决方案1
1 已采纳 2019-10-09 14:59:43

如何从 Watson Speech-to-Text output 重建对话？

问题描述

1 个解决方案

解决方案1 1 已采纳 2019-10-09 14:59:43

解决方案1
1 已采纳 2019-10-09 14:59:43