简体   繁体   English

如何从 Watson Speech-to-Text output 重建对话?

[英]How to reconstruct a conversation from Watson Speech-to-Text output?

I have the JSON output from Watson's Speech-to-Text service that I have converted into a list and then into a Pandas data-frame.我有来自 Watson 的 Speech-to-Text 服务的 JSON output,我已将其转换为列表,然后转换为 Pandas 数据帧。

I'm trying to identify how to reconstruct the conversation (with timings) akin to the following:我正在尝试确定如何重建对话(带有时间),类似于以下内容:

Speaker 0: Said this [00.01 - 00.12]演讲者 0:说过这个 [00.01 - 00.12]

Speaker 1: Said that [00.12 - 00.22]演讲者 1:说过 [00.12 - 00.22]

Speaker 0: Said something else [00.22 - 00.56]演讲者 0:说了点别的 [00.22 - 00.56]

My data-frame has a row for each word, and columns for the word, its start/end time, and the speaker tag (either 0 or 1).我的数据框每个单词都有一行,单词的列、开始/结束时间和说话者标签(0 或 1)。

words = [['said', 0.01, 0.06, 0],['this', 0.06, 0.12, 0],['said', 0.12, 
0.15, 1],['that', 0.15, 0.22, 1],['said', 0.22, 0.31, 0],['something', 
0.31, 0.45, 0],['else', 0.45, 0.56, 0]]

Ideally, what I am looking to create is the following, where words spoken by the same speaker are grouped together, and are broken when the next speaker steps in:理想情况下,我要创建的是以下内容,其中同一说话者所说的单词被组合在一起,并在下一个说话者介入时被打破:

grouped_words = [[['said','this'], 0.01, 0.12, 0],[['said','that'] 0.12, 
0.22, 1],[['said','something','else'] 0.22, 0.56, 0]

UPDATE: As per request, a link to a sample of the JSON file obtained is at https://github.com/cookie1986/STT_test更新:根据请求,获得的 JSON 文件示例的链接位于https://github.com/cookie1986/STT_test

Should be pretty straightforward to load the speaker labels into a Pandas Dataframe for a nice easy graphical view and then identifying the speaker shifts.将扬声器标签加载到 Pandas Dataframe 中应该非常简单,以获得漂亮的简单图形视图,然后识别扬声器移位。

speakers=pd.DataFrame(jsonconvo['speaker_labels']).loc[:,['from','speaker','to']]
convo=pd.DataFrame(jsonconvo['results'][0]['alternatives'][0]['timestamps'])
speakers=speakers.join(convo)

Output: Output:

   from  speaker    to          0     1     2
0  0.01        0  0.06       said  0.01  0.06
1  0.06        0  0.12       this  0.06  0.12
2  0.12        1  0.15       said  0.12  0.15
3  0.15        1  0.22       that  0.15  0.22
4  0.22        0  0.31       said  0.22  0.31
5  0.31        0  0.45  something  0.31  0.45
6  0.45        0  0.56       else  0.45  0.56

From there, you can ID only speaker shifts and collapse the dataframe with a quick loop从那里,您可以仅识别扬声器移位并通过快速循环折叠 dataframe

ChangeSpeaker=speakers.loc[speakers['speaker'].shift().=speakers['speaker']].index

Transcript=pd.DataFrame(columns=['from','to','speaker','transcript'])
for counter in range(0,len(ChangeSpeaker)):
    print(counter)
    currentindex=ChangeSpeaker[counter]
    try:
        nextIndex=ChangeSpeaker[counter+1]-1
        temp=speakers.loc[currentindex:nextIndex,:]
    except:
        temp=speakers.loc[currentindex:,:]
Transcript=Transcript.append(pd.DataFrame([[temp.head(1)['from'].values[0],temp.tail(1)['to'].values[0],temp.head(1)['speaker'].values[0],temp[0].tolist()]],columns=['from','to','speaker','transcript']))

You want to take the start point from the first value (hence head) and then the end point from the last vlaue in the temporary dataframe.您想从临时 dataframe 中的第一个值(因此为头)获取起点,然后从最后一个值获取终点。 Additionally, to handle the last speaker case (where you 'd normally get an array out of bounds error, you use a try/catch.此外,要处理最后一个扬声器案例(通常会出现数组越界错误,您可以使用 try/catch.

Output: Output:

   from    to speaker               transcript
0  0.01  0.12       0             [said, this]
0  0.12  0.22       1             [said, that]
0  0.22  0.56       0  [said, something, else]

Full Code Here完整代码在这里

import json
import pandas as pd

jsonconvo=json.loads("""{
   "results": [
      {
         "alternatives": [
            {
               "timestamps": [
                  [
                     "said", 
                     0.01, 
                     0.06
                  ], 
                  [
                     "this", 
                     0.06, 
                     0.12
                  ], 
                  [
                     "said", 
                     0.12, 
                     0.15
                  ], 
                  [
                     "that", 
                     0.15, 
                     0.22
                  ], 
                  [
                     "said", 
                     0.22, 
                     0.31
                  ], 
                  [
                     "something", 
                     0.31, 
                     0.45
                  ], 
                  [
                     "else", 
                     0.45, 
                     0.56
                  ]
               ], 
               "confidence": 0.85, 
               "transcript": "said this said that said something else "
            }
         ], 
         "final": true
      }
   ], 
   "result_index": 0, 
   "speaker_labels": [
      {
         "from": 0.01, 
         "to": 0.06, 
         "speaker": 0, 
         "confidence": 0.55, 
         "final": false
      }, 
      {
         "from": 0.06, 
         "to": 0.12, 
         "speaker": 0, 
         "confidence": 0.55, 
         "final": false
      }, 
      {
         "from": 0.12, 
         "to": 0.15, 
         "speaker": 1, 
         "confidence": 0.55, 
         "final": false
      }, 
      {
         "from": 0.15, 
         "to": 0.22, 
         "speaker": 1, 
         "confidence": 0.55, 
         "final": false
      }, 
      {
         "from": 0.22, 
         "to": 0.31, 
         "speaker": 0, 
         "confidence": 0.55, 
         "final": false
      }, 
      {
         "from": 0.31, 
         "to": 0.45, 
         "speaker": 0, 
         "confidence": 0.55, 
         "final": false
      }, 
      {
         "from": 0.45, 
         "to": 0.56, 
         "speaker": 0, 
         "confidence": 0.54, 
         "final": false
      }
   ]
}""")



speakers=pd.DataFrame(jsonconvo['speaker_labels']).loc[:,['from','speaker','to']]
convo=pd.DataFrame(jsonconvo['results'][0]['alternatives'][0]['timestamps'])
speakers=speakers.join(convo)

ChangeSpeaker=speakers.loc[speakers['speaker'].shift()!=speakers['speaker']].index


Transcript=pd.DataFrame(columns=['from','to','speaker','transcript'])
for counter in range(0,len(ChangeSpeaker)):
    print(counter)
    currentindex=ChangeSpeaker[counter]
    try:
        nextIndex=ChangeSpeaker[counter+1]-1
        temp=speakers.loc[currentindex:nextIndex,:]
    except:
        temp=speakers.loc[currentindex:,:]



    Transcript=Transcript.append(pd.DataFrame([[temp.head(1)['from'].values[0],temp.tail(1)['to'].values[0],temp.head(1)['speaker'].values[0],temp[0].tolist()]],columns=['from','to','speaker','transcript']))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在Choregraphe中导入IBM Bluemix Watson语音到文本? - How to import IBM Bluemix Watson speech-to-text in Choregraphe? IBM Speech-To-Text 的输出 - Output of IBM Speech-To-Text 无法解析IBM Watson Speech To Text的JSON输出 - Unable to Parse JSON output from IBM Watson Speech To Text 如何为IBM Watson的语音转文本服务Web套接字端点找到必要的访问Web令牌? - How to find necessary access web token for IBM Watson`s speech-to-text service web-socket endpoint? 使用 SpeechRecognition identify_google() 从 wav 文件中截断语音到文本 output - Truncated speech-to-text output from wav file with SpeechRecognition recognize_google() 编辑 Azure Python 代码以清理 Speech-to-Text 输出 - Edit Azure Python code to clean up Speech-to-Text output IBM Watson Speech-to-Text Python,“DetailedResponse”对象没有“getResult”属性 - IBM Watson Speech-to-Text Python, 'DetailedResponse' object has no attribute 'getResult' 当我尝试使用IBM-Watson更改语音到文本的语言时出现错误404 - Error 404 when I try to change language for my Speech-to-Text with IBM-Watson 使用websockets连接到Watson Speech-to-Text API进行实时转录 - Connect to Watson Speech-to-Text API using websockets for real-time transcription 如何识别 python(语音到文本)中从音频转换为文本的段落中的句子 - How to identify sentences in a paragraph which is convered from audio to text in python (speech-to-text)
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM