[英]How to reconstruct a conversation from Watson Speech-to-Text output?
I have the JSON output from Watson's Speech-to-Text service that I have converted into a list and then into a Pandas data-frame.我有来自 Watson 的 Speech-to-Text 服务的 JSON output,我已将其转换为列表,然后转换为 Pandas 数据帧。
I'm trying to identify how to reconstruct the conversation (with timings) akin to the following:我正在尝试确定如何重建对话(带有时间),类似于以下内容:
Speaker 0: Said this [00.01 - 00.12]演讲者 0:说过这个 [00.01 - 00.12]
Speaker 1: Said that [00.12 - 00.22]演讲者 1:说过 [00.12 - 00.22]
Speaker 0: Said something else [00.22 - 00.56]演讲者 0:说了点别的 [00.22 - 00.56]
My data-frame has a row for each word, and columns for the word, its start/end time, and the speaker tag (either 0 or 1).我的数据框每个单词都有一行,单词的列、开始/结束时间和说话者标签(0 或 1)。
words = [['said', 0.01, 0.06, 0],['this', 0.06, 0.12, 0],['said', 0.12,
0.15, 1],['that', 0.15, 0.22, 1],['said', 0.22, 0.31, 0],['something',
0.31, 0.45, 0],['else', 0.45, 0.56, 0]]
Ideally, what I am looking to create is the following, where words spoken by the same speaker are grouped together, and are broken when the next speaker steps in:理想情况下,我要创建的是以下内容,其中同一说话者所说的单词被组合在一起,并在下一个说话者介入时被打破:
grouped_words = [[['said','this'], 0.01, 0.12, 0],[['said','that'] 0.12,
0.22, 1],[['said','something','else'] 0.22, 0.56, 0]
UPDATE: As per request, a link to a sample of the JSON file obtained is at https://github.com/cookie1986/STT_test更新:根据请求,获得的 JSON 文件示例的链接位于https://github.com/cookie1986/STT_test
Should be pretty straightforward to load the speaker labels into a Pandas Dataframe for a nice easy graphical view and then identifying the speaker shifts.将扬声器标签加载到 Pandas Dataframe 中应该非常简单,以获得漂亮的简单图形视图,然后识别扬声器移位。
speakers=pd.DataFrame(jsonconvo['speaker_labels']).loc[:,['from','speaker','to']]
convo=pd.DataFrame(jsonconvo['results'][0]['alternatives'][0]['timestamps'])
speakers=speakers.join(convo)
Output: Output:
from speaker to 0 1 2
0 0.01 0 0.06 said 0.01 0.06
1 0.06 0 0.12 this 0.06 0.12
2 0.12 1 0.15 said 0.12 0.15
3 0.15 1 0.22 that 0.15 0.22
4 0.22 0 0.31 said 0.22 0.31
5 0.31 0 0.45 something 0.31 0.45
6 0.45 0 0.56 else 0.45 0.56
From there, you can ID only speaker shifts and collapse the dataframe with a quick loop从那里,您可以仅识别扬声器移位并通过快速循环折叠 dataframe
ChangeSpeaker=speakers.loc[speakers['speaker'].shift().=speakers['speaker']].index
Transcript=pd.DataFrame(columns=['from','to','speaker','transcript'])
for counter in range(0,len(ChangeSpeaker)):
print(counter)
currentindex=ChangeSpeaker[counter]
try:
nextIndex=ChangeSpeaker[counter+1]-1
temp=speakers.loc[currentindex:nextIndex,:]
except:
temp=speakers.loc[currentindex:,:]
Transcript=Transcript.append(pd.DataFrame([[temp.head(1)['from'].values[0],temp.tail(1)['to'].values[0],temp.head(1)['speaker'].values[0],temp[0].tolist()]],columns=['from','to','speaker','transcript']))
You want to take the start point from the first value (hence head) and then the end point from the last vlaue in the temporary dataframe.您想从临时 dataframe 中的第一个值(因此为头)获取起点,然后从最后一个值获取终点。 Additionally, to handle the last speaker case (where you 'd normally get an array out of bounds error, you use a try/catch.
此外,要处理最后一个扬声器案例(通常会出现数组越界错误,您可以使用 try/catch.
Output: Output:
from to speaker transcript
0 0.01 0.12 0 [said, this]
0 0.12 0.22 1 [said, that]
0 0.22 0.56 0 [said, something, else]
Full Code Here完整代码在这里
import json
import pandas as pd
jsonconvo=json.loads("""{
"results": [
{
"alternatives": [
{
"timestamps": [
[
"said",
0.01,
0.06
],
[
"this",
0.06,
0.12
],
[
"said",
0.12,
0.15
],
[
"that",
0.15,
0.22
],
[
"said",
0.22,
0.31
],
[
"something",
0.31,
0.45
],
[
"else",
0.45,
0.56
]
],
"confidence": 0.85,
"transcript": "said this said that said something else "
}
],
"final": true
}
],
"result_index": 0,
"speaker_labels": [
{
"from": 0.01,
"to": 0.06,
"speaker": 0,
"confidence": 0.55,
"final": false
},
{
"from": 0.06,
"to": 0.12,
"speaker": 0,
"confidence": 0.55,
"final": false
},
{
"from": 0.12,
"to": 0.15,
"speaker": 1,
"confidence": 0.55,
"final": false
},
{
"from": 0.15,
"to": 0.22,
"speaker": 1,
"confidence": 0.55,
"final": false
},
{
"from": 0.22,
"to": 0.31,
"speaker": 0,
"confidence": 0.55,
"final": false
},
{
"from": 0.31,
"to": 0.45,
"speaker": 0,
"confidence": 0.55,
"final": false
},
{
"from": 0.45,
"to": 0.56,
"speaker": 0,
"confidence": 0.54,
"final": false
}
]
}""")
speakers=pd.DataFrame(jsonconvo['speaker_labels']).loc[:,['from','speaker','to']]
convo=pd.DataFrame(jsonconvo['results'][0]['alternatives'][0]['timestamps'])
speakers=speakers.join(convo)
ChangeSpeaker=speakers.loc[speakers['speaker'].shift()!=speakers['speaker']].index
Transcript=pd.DataFrame(columns=['from','to','speaker','transcript'])
for counter in range(0,len(ChangeSpeaker)):
print(counter)
currentindex=ChangeSpeaker[counter]
try:
nextIndex=ChangeSpeaker[counter+1]-1
temp=speakers.loc[currentindex:nextIndex,:]
except:
temp=speakers.loc[currentindex:,:]
Transcript=Transcript.append(pd.DataFrame([[temp.head(1)['from'].values[0],temp.tail(1)['to'].values[0],temp.head(1)['speaker'].values[0],temp[0].tolist()]],columns=['from','to','speaker','transcript']))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.