简体   繁体   中英

How to remove duplicates in a list of lists, keeping the element with the highest value based on the second element in the list? Python

I have a list of lists with repeating items in the first entry of the list. I would like to remove duplicates and only keep the items with the highest score (based on the second entry of the list)

list_dup = [["Apple", 24],
["Apple", 23], 
["Sun", 15], 
["Apple", 2], 
["Sun", 1],
["Blue", 15]
]

Output:

list_dup = [["Apple", 24], 
    ["Sun", 15], 
    ["Blue", 15]
    ]
import pandas as pd
pd.DataFrame(list_dup).groupby(0).max().reset_index().values.tolist()

Step by step:

  • convert the list in a pd.DataFrame ;
  • group rows according the value of the first column (column 0 , the one containing the strings);
  • taking the max values for the other columns for each group (in your case you have just one other column, the one containing the integers);
  • convert the resulting pd.DataFrame in a numpy array (with .values ) and then converting it in a list .

If the order of the output list is not important, you can use sorted to sort the list by the first elements of the sub-lists, then use itertools.groupby to pull the pull together groups based on the first elements, and finally use max to get the highest element based on the second element.

from itertools import groupby

[max(g, key=lambda x: x[1]) for _, g in groupby(sorted(list_dup), key=lambda x: x[0])]
# returns:
[['Apple', 24], ['Blue', 15], ['Sun', 15]]

Many possibilities. One of the clearest may be:

m_d = {}
for k in list_dup:
    if k[0] in m_d:
        if m_d[k[0]] < k[1]:
            m_d[k[0]] = k[1]
    else:
        m_d[k[0]] = k[1]

list_no_dup = [[k, v] for k, v in m_d.items()]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM