简体   繁体   中英

Filter dictionary of tuples with duplicated first element of the values based on some condition

Sample input data in which Value1 is duplicated across several keys

{'Key1': ('Value1', '28.302', '30', '131', '10', '321'), 
 'Key2': ('Value2', '42.373', '44', '98', '1252', '1413'), 
 'Key3': ('Value1', '34.048', '4', '375', '22', '1275'), 
 'Key4': ('Value3', '47.561', '82', '159', '901', '1146'), 
 'Key5': ('Value1', '35.821', '214', '279', '82', '282')}

Desired results in which key1 and key3 have been filtered out because the second element of Value1 is the highest in Key5.

{'Key2': ('Value2', '42.373', '44', '98', '1252', '1413'), 
 'Key4': ('Value3', '47.561', '82', '159', '901', '1146'), 
 'Key5': ('Value1', '35.821', '214', '279', '82', '282')}

My attempts thus far have failed and they are probably useless to post them here!

You'd probably want to do this in a multi-step process.

import itertools

d = {'Key1': ('Value1', '28.302', '30', '131', '10', '321'), 'Key2': ('Value2', '42.373', '44', '98', '1252', '1413'), 'Key3': ('Value1', '34.048', '4', '375', '22', '1275'), 'Key4': ('Value3', '47.561', '82', '159', '901', '1146'), 'Key5': ('Value1', '35.821', '214', '279', '82', '282')}

filtered = dict(
    max(group, key=lambda tup: tup[1])
    for _, group in itertools.groupby(
        sorted(d.items(), key=lambda tup: tup[1]),
        key=lambda tup: tup[1][0]                 
    )
)
# {'Key5': ('Value1', '35.821', '214', '279', '82', '282'), 
#  'Key2': ('Value2', '42.373', '44', '98', '1252', '1413'), 
#  'Key4': ('Value3', '47.561', '82', '159', '901', '1146')}

The process is:

  1. use sorted() to rearrange d.items() so that all identical [first element of tuple] are next to each other (otherwise groupby() won't work)
  2. use itertools.groupby() to collect all items with the same first element of that tuple.
  3. use max() to take the max of each group
  4. convert the list of (key, value) tuples back into a dict

You can insert another sorted() between steps 3 and 4 if you want the keys inserted in a specific order - but it's a dict, so order ought not matter.

As an alternative, you could use a pandas data frame approach

import pandas as pd

d = {'Key1': ('Value1', '28.302', '30', '131', '10', '321'),
     'Key2': ('Value2', '42.373', '44', '98', '1252', '1413'),
     'Key3': ('Value1', '34.048', '4', '375', '22', '1275'),
     'Key4': ('Value3', '47.561', '82', '159', '901', '1146'),
     'Key5': ('Value1', '35.821', '214', '279', '82', '282')}

data = pd.DataFrame.from_dict(d, orient='index').reset_index()
data.rename(columns={"index": "Key", 0: "Value"}, inplace=True)
data = data.set_index(["Key", "Value"], drop=True).sort_index(ascending=True)

At this point you have turned your dict into a multiindex dataframe:

                  1    2    3     4     5
Key  Value                               
Key1 Value1  28.302   30  131    10   321
Key2 Value2  42.373   44   98  1252  1413
Key3 Value1  34.048    4  375    22  1275
Key4 Value3  47.561   82  159   901  1146
Key5 Value1  35.821  214  279    82   282

This allows you do to all kind of operations. Finding you wanted rows would be like:

max_rows = list()
sort_column = 1
for key_name, df in data.groupby("Value"):
    max_row = df.sort_values(sort_column, ascending=False).head(1)
    max_rows.append(max_row)
result = pd.concat(max_rows).sort_index()
print(result)

This gives you a Dataframe which looks like this:

                  1    2    3     4     5
Key  Value                               
Key2 Value2  42.373   44   98  1252  1413
Key4 Value3  47.561   82  159   901  1146
Key5 Value1  35.821  214  279    82   282

I you need a dict with the tuples back you can do:

result2 = dict()
for index, row in result.iterrows():
    result2[index[0]] = tuple([index[1]] + row.values.tolist())

giving the desired result:

 {'Key2': ('Value2', '42.373', '44', '98', '1252', '1413'), 
  'Key4': ('Value3', '47.561', '82', '159', '901', '1146'), 
  'Key5': ('Value1', '35.821', '214', '279', '82', '282')}

Most likely the solution by Green Cloak Guy is faster, but having turned dicts into dataframes (multiindex) it is probably easier to manipulate your data

  • First get all dictionary keys as value not reference by using copy method(when dictionary change all_keys not will not be affected)
  • Declare current_key to access current key
  • First loop to access all keys
  • Every time return the new length of keys
  • Second loop to check if current key equal another key
  • If condition return true remove this key from dictionary and leave iterator
  • Every time end second loop increment current key by one

Edit

In the second loop we don't need dict keys every time we can get it once in first loop

Edit 2

When remove key from dict we shouldn't increment current key because the next one will be current so i add flag to control it.

 my_dic = {'Key1': ('Value1', '28.302', '30', '131', '10', '321'), 'Key2': ('Value2', '42.373', '44', '98', '1252', '1413'), 'Key3': ('Value1', '34.048', '4', '375', '22', '1275'), 'Key4': ('Value3', '47.561', '82', '159', '901', '1146'), 'Key5': ('Value1', '35.821', '214', '279', '82', '282')} # all keys all_keys = my_dic.copy().keys() # cuurent key current_key = 0 for key in all_keys: # get new number of keys number_of_keys = len(my_dic) # new dictionaty keys dict_keys = list(my_dic.keys()) # flag flag = True for index in range(current_key+1, number_of_keys): # if condation return true remove this key # from dictionary and leave second loop if(my_dic[key][0] == my_dic[dict_keys[index]][0]): my_dic.pop(key) flag = False break # when flag return true this mean we should jump to next kay # if not we should still becasue next element will be the current if flag: current_key += 1

you can build function to use this code any time

 # filterDictionary def filterDictionary(user_dict): """ this function return new dictionary after filter it params: - user_dict: the user dictionary to fitler it """ # get copy from dictionary To not be affected by any change user_dict = user_dict.copy() # all keys all_keys = user_dict.copy().keys() # cuurent key current_key = 0 for key in all_keys: # get new number of keys number_of_keys = len(user_dict) # new dictionaty keys dict_keys = list(user_dict.keys()) # flag flag = True for index in range(current_key+1, number_of_keys): # if condation return true remove this key # from dictionary and leave second loop if(user_dict[key][0] == user_dict[dict_keys[index]][0]): user_dict.pop(key) flag = False break # when flag return true this mean we should jump to next kay # if not we should still becasue next element will be the current if flag: current_key += 1 return user_dict my_dic = {'Key1': ('Value2', '28.302', '30', '131', '10', '321'), 'Key2': ('Value2', '42.373', '44', '98', '1252', '1413'), 'Key3': ('Value3', '34.048', '4', '375', '22', '1275'), 'Key4': ('Value2', '47.561', '82', '159', '901', '1146'), 'Key5': ('Value1', '35.821', '214', '279', '82', '282')} print(filterDictionary(my_dic))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM