簡體   English   中英

在具有重復元素的元組列表中提取顯示最大值的字符串

[英]Extract string showing maximum value in a list of tuple with duplicated elements

我有以下簡化的數據結構:

input = [("FileName1", "ID1", "Sequence1", 1000),
         ("FileName1", "ID1", "Sequence2", 500),
         ("FileName1", "ID2", "Sequence3", 1500),
         ("FileName1", "ID2", "Sequence5", 200),
         ("FileName2", "ID1", "Sequence1", 500),
         ("FileName2", "ID1", "Sequence2", 1000)
         ("FileName2", "ID2", "Sequence3", 250),
         ("FileName2", "ID2", "Sequence5", 2000)]

在這里,一個特定的ID可以與多個Sequences鏈接(並不總是相同數量的Sequences歸因於特定ID )並且多個ID可以鏈接到一個特定的File Name (並不總是相同數量的ID歸因於特定FileName

我想要的是為每個 ID 提取具有最大強度的三元組 FileName/ID/Sequence:

輸出:

output = [("FileName1", "ID1", "Sequence1"),
          ("FileName1", "ID2", "Sequence3"),
          ("FileName2", "ID1", "Sequence2")
          ("FileName2", "ID2", "Sequence5")]

最后,我需要為每個 ID 提供一個唯一序列(具有最大值)並同時獲取 FileName,因為我需要所有這些信息來將它們映射到數據幀。

文件名將不再有任何重復的 ID,並且一個唯一的序列將與特定的 ID 相關聯。

謝謝你的幫助

使用itertools

前任:

import itertools

input = [("FileName1", "ID1", "Sequence1", 1000),
         ("FileName1", "ID1", "Sequence2", 500),
         ("FileName1", "ID2", "Sequence3", 1500),
         ("FileName1", "ID2", "Sequence5", 200),
         ("FileName2", "ID1", "Sequence1", 500),
         ("FileName2", "ID1", "Sequence2", 1000),
         ("FileName2", "ID2", "Sequence3", 250),
         ("FileName2", "ID2", "Sequence5", 2000)]


result = []
for k, v in itertools.groupby(input, lambda x: (x[0], x[1])):
    result.append(max(list(v), key=lambda x: x[-1]))

# OR
# result = [max(list(v), key=lambda x: x[-1]) for k, v in itertools.groupby(input, lambda x: (x[0], x[1]))]  
    
print(result)

輸出

[('FileName1', 'ID1', 'Sequence1', 1000),
 ('FileName1', 'ID2', 'Sequence3', 1500),
 ('FileName2', 'ID1', 'Sequence2', 1000),
 ('FileName2', 'ID2', 'Sequence5', 2000)]

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM