I have a list of lists with repeating items in the first entry of the list. I would like to remove duplicates and only keep the items with the highest score (based on the second entry of the list)
list_dup = [["Apple", 24],
["Apple", 23],
["Sun", 15],
["Apple", 2],
["Sun", 1],
["Blue", 15]
]
Output:
list_dup = [["Apple", 24],
["Sun", 15],
["Blue", 15]
]
import pandas as pd
pd.DataFrame(list_dup).groupby(0).max().reset_index().values.tolist()
Step by step:
pd.DataFrame
;0
, the one containing the strings);pd.DataFrame
in a numpy array
(with .values
) and then converting it in a list
. If the order of the output list is not important, you can use sorted
to sort the list by the first elements of the sub-lists, then use itertools.groupby
to pull the pull together groups based on the first elements, and finally use max
to get the highest element based on the second element.
from itertools import groupby
[max(g, key=lambda x: x[1]) for _, g in groupby(sorted(list_dup), key=lambda x: x[0])]
# returns:
[['Apple', 24], ['Blue', 15], ['Sun', 15]]
Many possibilities. One of the clearest may be:
m_d = {}
for k in list_dup:
if k[0] in m_d:
if m_d[k[0]] < k[1]:
m_d[k[0]] = k[1]
else:
m_d[k[0]] = k[1]
list_no_dup = [[k, v] for k, v in m_d.items()]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.