[英]how to sum up a list of timestamps by the speaker name
我正在做一个项目,我已经从列表中提取数据,现在有 3 个列表:
list 1 - 演讲者姓名列表
['<M1>', '<M1>', '<M1>', '<M1>', '<M1>', '<M2>', '<M2>', '<M2>', '<M1>', '<M1>', '<M2>', '<M1>', '<M2>', '<M2>', '<M2>', '<M2>', '<M2>']
列表 2 - 通话时间戳开始的列表
['[00:00:00.000]', '[00:00:08.010]', '[00:00:16.890]', '[00:00:26.210]', '[00:00:39.980]', '[00:00:48.100]', '[00:00:56.770]', '[00:01:08.010]', '[00:01:21.120]', '[00:01:46.130]', '[00:01:47.180]', '[00:01:49.390]', '[00:01:50.670]', '[00:02:02.320]', '[00:02:16.010]', '[00:02:21.110]', '[00:02:27.610]']
列表 3 - 通话时间戳结束的列表
['[00:00:08.010]', '[00:00:16.290]', '[00:00:26.210]', '[00:00:39.980]', '[00:00:48.100]', '[00:00:56.770]', '[00:01:08.010]', '[00:01:20.250]', '[00:01:33.850]', '[00:01:47.150]', '[00:01:49.370]', '[00:01:50.140]', '[00:02:01.350]', '[00:02:16.010]', '[00:02:20.150]', '[00:02:27.610]', '[00:02:39.040]']
我需要做的是每当一个发言者多次讲话时(例如列表的前 5 个元素),我需要将第一个结束段 [00:00:08.010] 更改为 [00:00:48.100] 并摆脱之间的所有条目(将只有一个发言者的 5 个条目变为 1 个条目)并对列表中的所有发言者再次执行此操作。 如果说话者只说了一次,那么它需要保持不变。 有人可以帮助我并找到在 python 中执行此操作的方法吗? 谢谢 !
speakerOrder = ['<M1>', '<M1>', '<M1>', '<M1>', '<M1>', '<M2>', '<M2>', '<M2>', '<M1>', '<M1>', '<M2>', '<M1>', '<M2>', '<M2>', '<M2>', '<M2>', '<M2>']
speakerBegin = ['[00:00:00.000]', '[00:00:08.010]', '[00:00:16.890]', '[00:00:26.210]', '[00:00:39.980]', '[00:00:48.100]', '[00:00:56.770]', '[00:01:08.010]', '[00:01:21.120]', '[00:01:46.130]', '[00:01:47.180]', '[00:01:49.390]', '[00:01:50.670]', '[00:02:02.320]', '[00:02:16.010]', '[00:02:21.110]', '[00:02:27.610]']
speakerEnd = ['[00:00:08.010]', '[00:00:16.290]', '[00:00:26.210]', '[00:00:39.980]', '[00:00:48.100]', '[00:00:56.770]', '[00:01:08.010]', '[00:01:20.250]', '[00:01:33.850]', '[00:01:47.150]', '[00:01:49.370]', '[00:01:50.140]', '[00:02:01.350]', '[00:02:16.010]', '[00:02:20.150]', '[00:02:27.610]', '[00:02:39.040]']
newSpeakerOrder = []
newSpeakerBegin = []
newSpeakerEnd = []
currentSpeaker = None
for speakerIndex in range(len(speakerOrder)):
speaker = speakerOrder[speakerIndex]
if(currentSpeaker!=speaker):
#If someone was already speaking add the time it ended
if(currentSpeaker!=None):
newSpeakerEnd.append(speakerEnd[speakerIndex-1])
#Add the new Speaker
newSpeakerOrder.append(speaker)
currentSpeaker = speaker
#Add the time it began
newSpeakerBegin.append(speakerBegin[speakerIndex])
#Add the final time the last person stopped speaking
newSpeakerEnd.append(speakerEnd[-1])
print(newSpeakerOrder)
print(newSpeakerBegin)
print(newSpeakerEnd)
这是我提出的解决方案,虽然不完美,但应该可以解决您的问题。 只需事先确保原始 arrays 具有相同的长度。
您可以在 itertools 中使用 groupby function ,试试这个
from itertools import groupby
l1 = ['<M1>', '<M1>', '<M1>', '<M1>', '<M1>', '<M2>', '<M2>', '<M2>', '<M1>', '<M1>', '<M2>', '<M1>', '<M2>', '<M2>', '<M2>', '<M2>', '<M2>']
l2= ['[00:00:00.000]', '[00:00:08.010]', '[00:00:16.890]', '[00:00:26.210]', '[00:00:39.980]', '[00:00:48.100]', '[00:00:56.770]', '[00:01:08.010]', '[00:01:21.120]', '[00:01:46.130]', '[00:01:47.180]', '[00:01:49.390]', '[00:01:50.670]', '[00:02:02.320]', '[00:02:16.010]', '[00:02:21.110]', '[00:02:27.610]']
l3 = ['[00:00:08.010]', '[00:00:16.290]', '[00:00:26.210]', '[00:00:39.980]', '[00:00:48.100]', '[00:00:56.770]', '[00:01:08.010]', '[00:01:20.250]', '[00:01:33.850]', '[00:01:47.150]', '[00:01:49.370]', '[00:01:50.140]', '[00:02:01.350]', '[00:02:16.010]', '[00:02:20.150]', '[00:02:27.610]', '[00:02:39.040]']
start_index = 0
for (m,g) in groupby(l1):
end_index = start_index + len(list(g)) -1
start_time = l2[start_index]
end_time = l3[end_index]
start_index=end_index+1
print(start_time)
print(end_time)
print("============")
output
[00:00:00.000]
[00:00:48.100]
============
[00:00:48.100]
[00:01:20.250]
============
[00:01:21.120]
[00:01:47.150]
============
[00:01:47.180]
[00:01:49.370]
============
[00:01:49.390]
[00:01:50.140]
============
[00:01:50.670]
[00:02:39.040]
============
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.