简体   繁体   中英

Is there a way to filter a list of dictionaries based on a value in one dictionary being less than the same key in another?

I apologize for the convoluted title. I need to filter a list of dictionaries by a fairly specific criteria.

Normally, I would do a list comprehension, but I'm not positive on the logic.

Here's an example list:

list_dict = [{'item_id': '000354', 'ts_created': '11/12/2013', 'item_desc': 'a product'},
             {'item_id': '000354', 'ts_created': '11/13/2013', 'item_desc': 'a product'},
             {'item_id': '000355', 'ts_created': '11/12/2013', 'item_desc': 'a different product'}]

You'll notice that the first two dictionary items are identical besides 'ts_created'.

I want to create a new dictionary keeping all items with the earliest timestamp, and discarding the rest.

Edit: Removed 'elegantly' from title as it seemed to offend some.

Edit 2: Tried to improve title.

Edit 3 (focus?): I'm really not sure how to focus this question anymore than it already is, but I'll try. In reference to the example code above (the actual list is much greater), There are duplicate dictionaries within the list. The only difference in them is the 'ts_created' values. I want to only keep the unique 'item_id' dictionaries, and further the earliest 'ts_created'. The resulting list would look like this.

list_dict = [{'item_id': '000354', 'ts_created': '11/12/2013', 'item_desc': 'a product'},
             {'item_id': '000355', 'ts_created': '11/12/2013', 'item_desc': 'a different product'}]

You can filter the dictionaries using a dictionary of dictionaries keyed on the item_id. As you populate that indexes dictionary, only keep the items that have thegreatest timestamp. Since your time stamps are strings not formatted in the international standard you will need to convert them to actual dates to compare them. A second dictionary (indexed on the item_id as well) can be used to keep track of the converted timestamps.

list_dict = [{'item_id': '000354', 'ts_created': '11/12/2013', 'item_desc': 'a product'},
             {'item_id': '000354', 'ts_created': '11/13/2013', 'item_desc': 'a product'},
             {'item_id': '000355', 'ts_created': '11/12/2013', 'item_desc': 'a different product'}]

from datetime import datetime
maxDates = dict()  # association between item and timestamp
result   = dict()  # indexed single instance result (dictionary of dictionaries)
for d in list_dict:
    key       = d['item_id']
    timestamp = datetime.strptime(d['ts_created'], '%m/%d/%Y') # usable timestamp
    if itemId not in result or timestamp>maxDates[key]:        # keep only latest
        result[key]   = d
        maxDates[key] = timestamp
result = list(result.values())    # convert back to a list of dictionaries

print(result)
        
[{'item_id': '000354', 'ts_created': '11/13/2013', 'item_desc': 'a product'},
 {'item_id': '000355', 'ts_created': '11/12/2013', 'item_desc': 'a different product'}]

If uniqueness is determined by multiple fields (as opposed to just the item_id), you will need to combine all the values into a single key.

For example (for all fields except the time stamp):

key = tuple(d[k] for k in sorted(d) if k != 'ts_created')

You can use pandas.DataFrame , order by date and then drop all the duplicates.

import pandas

df = pandas.DataFrame(list_dict)
# To datetime
df['ts_created'] = pandas.to_datetime(df['ts_created'])
# Sort by item_id, then by date
df.sort_values(by=['item_id', 'ts_created'], inplace=True)
# Drop duplicates, leaving only the first item_id
df.drop_duplicates(subset=['item_id'], keep='first', inplace=True)
# Convert the dates back to the original format
df['ts_created'] = df.ts_created.dt.strftime('%m/%d/%Y')
# Create the list again
df.to_dict(orient='records')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM