如何根据时间戳从字符串列表中删除重复项

Question

I have the following list:我有以下列表：

ls = ["2022-07-17 16:00:02 txt xyz", "2022-07-17 15:00:02 txt xyz", "2022-07-17 16:00:02 txt abc"]

I only want to keep entries where the text is unique (xyz and abc), and where the timestamp is newer.我只想保留文本唯一（xyz 和 abc）以及时间戳更新的条目。 This is my expected outcome:这是我的预期结果：

ls = ["2022-07-17 16:00:02 txt xyz", "2022-07-17 16:00:02 txt abc"]

My approach was to use a dictionary sorted by value, but then I still don't know how to remove the older timestamp.我的方法是使用按值排序的字典，但是我仍然不知道如何删除较旧的时间戳。

import re

keep_message = {}
for i in range(len(ls)):
    timestamp_str = re.search(r"^(.*?) txt", ls[i]).group(1)
    timestamp = datetime.datetime.strptime(timestamp_str, "%Y-%m-%d %H:%M:%S")
    text = re.search(r"txt (.*?)$", ls[i]).group(1)
    keep_message[text + "_" + timestamp_str] = timestamp

keep_message_sorted = dict(sorted(keep_message.items(), key=lambda item: item[1]))

Is there a better solution?有更好的解决方案吗？

Answer 1

Use a dictionary to keep track of the most recent date per text:使用字典来跟踪每个文本的最新日期：

d = {}
for x in ls:
    # get txt (NB. you can also use a regex)
    ts, txt = x.split(' txt ', 1)
    if txt not in d or x > d[txt]:
        d[txt] = x

out = list(d.values())

NB.注意。 I used a simple split to get the txt and also performed the comparison on the full string as the date is first and in a format compatible with sorting as string.我使用了一个简单的split来获取 txt，并且还对完整的字符串进行了比较，因为日期是第一位的，并且格式与作为字符串排序兼容。 However, you can use another extraction method (regex), and perform the comparison only on the datetime part.但是，您可以使用另一种提取方法（正则表达式），并仅对日期时间部分执行比较。

Output:输出：

['2022-07-17 16:00:02 txt xyz', '2022-07-17 16:00:02 txt abc']

如何根据时间戳从字符串列表中删除重复项

问题描述

1 个解决方案

解决方案1
0 已采纳 2022-07-17 21:16:08

如何根据时间戳从字符串列表中删除重复项

问题描述

1 个解决方案

解决方案1 0 已采纳 2022-07-17 21:16:08

解决方案1
0 已采纳 2022-07-17 21:16:08