如何按演讲者姓名汇总时间戳列表

Question

我正在做一个项目，我已经从列表中提取数据，现在有 3 个列表：
list 1 - 演讲者姓名列表

['<M1>', '<M1>', '<M1>', '<M1>', '<M1>', '<M2>', '<M2>', '<M2>', '<M1>', '<M1>', '<M2>', '<M1>', '<M2>', '<M2>', '<M2>', '<M2>', '<M2>']

列表 2 - 通话时间戳开始的列表

['[00:00:00.000]', '[00:00:08.010]', '[00:00:16.890]', '[00:00:26.210]', '[00:00:39.980]', '[00:00:48.100]', '[00:00:56.770]', '[00:01:08.010]', '[00:01:21.120]', '[00:01:46.130]', '[00:01:47.180]', '[00:01:49.390]', '[00:01:50.670]', '[00:02:02.320]', '[00:02:16.010]', '[00:02:21.110]', '[00:02:27.610]']

列表 3 - 通话时间戳结束的列表

['[00:00:08.010]', '[00:00:16.290]', '[00:00:26.210]', '[00:00:39.980]', '[00:00:48.100]', '[00:00:56.770]', '[00:01:08.010]', '[00:01:20.250]', '[00:01:33.850]', '[00:01:47.150]', '[00:01:49.370]', '[00:01:50.140]', '[00:02:01.350]', '[00:02:16.010]', '[00:02:20.150]', '[00:02:27.610]', '[00:02:39.040]']

我需要做的是每当一个发言者多次讲话时（例如列表的前 5 个元素），我需要将第一个结束段 [00:00:08.010] 更改为 [00:00:48.100] 并摆脱之间的所有条目（将只有一个发言者的 5 个条目变为 1 个条目）并对列表中的所有发言者再次执行此操作。 如果说话者只说了一次，那么它需要保持不变。 有人可以帮助我并找到在 python 中执行此操作的方法吗？ 谢谢！

Answer 1

speakerOrder    = ['<M1>', '<M1>', '<M1>', '<M1>', '<M1>', '<M2>', '<M2>', '<M2>', '<M1>', '<M1>', '<M2>', '<M1>', '<M2>', '<M2>', '<M2>', '<M2>', '<M2>']
speakerBegin    = ['[00:00:00.000]', '[00:00:08.010]', '[00:00:16.890]', '[00:00:26.210]', '[00:00:39.980]', '[00:00:48.100]', '[00:00:56.770]', '[00:01:08.010]', '[00:01:21.120]', '[00:01:46.130]', '[00:01:47.180]', '[00:01:49.390]', '[00:01:50.670]', '[00:02:02.320]', '[00:02:16.010]', '[00:02:21.110]', '[00:02:27.610]']
speakerEnd      = ['[00:00:08.010]', '[00:00:16.290]', '[00:00:26.210]', '[00:00:39.980]', '[00:00:48.100]', '[00:00:56.770]', '[00:01:08.010]', '[00:01:20.250]', '[00:01:33.850]', '[00:01:47.150]', '[00:01:49.370]', '[00:01:50.140]', '[00:02:01.350]', '[00:02:16.010]', '[00:02:20.150]', '[00:02:27.610]', '[00:02:39.040]']


newSpeakerOrder = []
newSpeakerBegin = []
newSpeakerEnd   = []

currentSpeaker = None
for speakerIndex in range(len(speakerOrder)):
    speaker = speakerOrder[speakerIndex]
    if(currentSpeaker!=speaker):
        #If someone was already speaking add the time it ended
        if(currentSpeaker!=None):
            newSpeakerEnd.append(speakerEnd[speakerIndex-1])
        #Add the new Speaker
        newSpeakerOrder.append(speaker)
        currentSpeaker = speaker
        #Add the time it began
        newSpeakerBegin.append(speakerBegin[speakerIndex])

#Add the final time the last person stopped speaking
newSpeakerEnd.append(speakerEnd[-1])

print(newSpeakerOrder)
print(newSpeakerBegin)
print(newSpeakerEnd)

这是我提出的解决方案，虽然不完美，但应该可以解决您的问题。 只需事先确保原始 arrays 具有相同的长度。

Answer 2

您可以在 itertools 中使用 groupby function ，试试这个

from itertools import groupby

l1 = ['<M1>', '<M1>', '<M1>', '<M1>', '<M1>', '<M2>', '<M2>', '<M2>', '<M1>', '<M1>', '<M2>', '<M1>', '<M2>', '<M2>', '<M2>', '<M2>', '<M2>']
l2= ['[00:00:00.000]', '[00:00:08.010]', '[00:00:16.890]', '[00:00:26.210]', '[00:00:39.980]', '[00:00:48.100]', '[00:00:56.770]', '[00:01:08.010]', '[00:01:21.120]', '[00:01:46.130]', '[00:01:47.180]', '[00:01:49.390]', '[00:01:50.670]', '[00:02:02.320]', '[00:02:16.010]', '[00:02:21.110]', '[00:02:27.610]']
l3 = ['[00:00:08.010]', '[00:00:16.290]', '[00:00:26.210]', '[00:00:39.980]', '[00:00:48.100]', '[00:00:56.770]', '[00:01:08.010]', '[00:01:20.250]', '[00:01:33.850]', '[00:01:47.150]', '[00:01:49.370]', '[00:01:50.140]', '[00:02:01.350]', '[00:02:16.010]', '[00:02:20.150]', '[00:02:27.610]', '[00:02:39.040]'] 
start_index = 0
for (m,g) in groupby(l1):
    end_index = start_index + len(list(g)) -1
    start_time = l2[start_index]
    end_time = l3[end_index]
    start_index=end_index+1
    print(start_time)
    print(end_time)
    print("============")

output

[00:00:00.000]
[00:00:48.100]
============
[00:00:48.100]
[00:01:20.250]
============
[00:01:21.120]
[00:01:47.150]
============
[00:01:47.180]
[00:01:49.370]
============
[00:01:49.390]
[00:01:50.140]
============
[00:01:50.670]
[00:02:39.040]
============

如何按演讲者姓名汇总时间戳列表

问题描述

2 个解决方案

解决方案1
2 已采纳 2020-06-03 06:06:05

解决方案2
1 2020-06-03 06:05:46

如何按演讲者姓名汇总时间戳列表

问题描述

2 个解决方案

解决方案1 2 已采纳 2020-06-03 06:06:05

解决方案2 1 2020-06-03 06:05:46

解决方案1
2 已采纳 2020-06-03 06:06:05

解决方案2
1 2020-06-03 06:05:46