[英]Extract string showing maximum value in a list of tuple with duplicated elements
我有以下簡化的數據結構:
input = [("FileName1", "ID1", "Sequence1", 1000),
("FileName1", "ID1", "Sequence2", 500),
("FileName1", "ID2", "Sequence3", 1500),
("FileName1", "ID2", "Sequence5", 200),
("FileName2", "ID1", "Sequence1", 500),
("FileName2", "ID1", "Sequence2", 1000)
("FileName2", "ID2", "Sequence3", 250),
("FileName2", "ID2", "Sequence5", 2000)]
在這里,一個特定的ID可以與多個Sequences鏈接(並不總是相同數量的Sequences歸因於特定ID )並且多個ID可以鏈接到一個特定的File Name (並不總是相同數量的ID歸因於特定FileName )
我想要的是為每個 ID 提取具有最大強度的三元組 FileName/ID/Sequence:
輸出:
output = [("FileName1", "ID1", "Sequence1"),
("FileName1", "ID2", "Sequence3"),
("FileName2", "ID1", "Sequence2")
("FileName2", "ID2", "Sequence5")]
最后,我需要為每個 ID 提供一個唯一序列(具有最大值)並同時獲取 FileName,因為我需要所有這些信息來將它們映射到數據幀。
文件名將不再有任何重復的 ID,並且一個唯一的序列將與特定的 ID 相關聯。
謝謝你的幫助
使用itertools
前任:
import itertools
input = [("FileName1", "ID1", "Sequence1", 1000),
("FileName1", "ID1", "Sequence2", 500),
("FileName1", "ID2", "Sequence3", 1500),
("FileName1", "ID2", "Sequence5", 200),
("FileName2", "ID1", "Sequence1", 500),
("FileName2", "ID1", "Sequence2", 1000),
("FileName2", "ID2", "Sequence3", 250),
("FileName2", "ID2", "Sequence5", 2000)]
result = []
for k, v in itertools.groupby(input, lambda x: (x[0], x[1])):
result.append(max(list(v), key=lambda x: x[-1]))
# OR
# result = [max(list(v), key=lambda x: x[-1]) for k, v in itertools.groupby(input, lambda x: (x[0], x[1]))]
print(result)
輸出
[('FileName1', 'ID1', 'Sequence1', 1000),
('FileName1', 'ID2', 'Sequence3', 1500),
('FileName2', 'ID1', 'Sequence2', 1000),
('FileName2', 'ID2', 'Sequence5', 2000)]
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.