简体   繁体   English

有什么pythonic方式可以清除此词典列表?

[英]What pythonic way is there to clean this list of dictionaries?

Hello and thanks for your help. 您好,感谢您的帮助。 I have a list of dictionaries that looks like this: 我有一个字典列表,看起来像这样:

list_balls = [{'id': '803371', 'is_used': False, 'source': 'store', 'air': 0.9},
{'id': '803371', 'is_used': False, 'source': 'donation', 'air': 0.20},
{'id': '30042', 'is_used': False, 'source': 'donation', 'air': 0.75},
{'id': '803371', 'is_used': False, 'source': 'store', 'air': 1}]

I need to clean this list leaving unique list of dictionaries. 我需要清理此列表,以保留词典的唯一列表。 If there is two entries or more with the same Id, I need to pick the one with the highest value on air. 如果有两个或两个以上具有相同ID的条目,我需要选择一个空中值最高的条目。 If they have equal values on air and ids, I need to leave the one where source == 'store'. 如果它们的air和id具有相等的值,我需要将其保留为source =='store'。 Therefore, the result in this case would be 因此,这种情况下的结果将是

list_balls = [{'id': '30042', 'is_used': False, 'source': 'donation', 'air': 0.75},
{'id': '803371', 'is_used': False, 'source': 'store', 'air': 1}]

I tried the following code to flag as keep = False for the ones that need to be taken out but it only works when there is two duplicates: 我尝试使用以下代码将需要取出的代码标记为keep = False,但仅在有两个重复项时才有效:

for i in range(0, len(list_balls )):
    if len(list_balls ) > 1:
        #print(list_balls [i])
        for j in range(1, len(list_balls )):
            if (list_balls [i]['id'] == list_balls [j]['id']):
                if (list_balls [i]['air'] > list_balls [j]['air']):
                    list_balls [i]['keep'] = True
                    list_balls [j]['keep'] = False
print(list_pns)

I assume this double for loop is not the most efficient way to do this either so any other ideas are welcome. 我认为double for循环也不是执行此操作的最有效方法,因此欢迎其他任何想法。 Thanks for your help 谢谢你的帮助

Using itertools.groupby 使用itertools.groupby

Ex: 例如:

from itertools import groupby
list_balls = [{'source': 'store', 'air': 0.9, 'id': '803371', 'is_used': False}, {'source': 'donation', 'air': 0.2, 'id': '803371', 'is_used': False}, {'source': 'donation', 'air': 0.75, 'id': '30042', 'is_used': False}, {'source': 'store', 'air': 1, 'id': '803371', 'is_used': False}]


#result = [max(list(v), key=lambda x: x["air"]) for k, v in groupby(sorted(list_balls, key=lambda x: x["id"]), lambda x: x["id"])]
result = [max(list(v), key=lambda x: (x["air"], x["source"] == "store")) for k, v in groupby(sorted(list_balls, key=lambda x: x["id"]), lambda x: x["id"])]
print(result)

Output: 输出:

[{'air': 0.75, 'id': '30042', 'is_used': False, 'source': 'donation'},
 {'air': 1, 'id': '803371', 'is_used': False, 'source': 'store'}]

Simply with something like this : 简单地像这样:

list_balls = [{'id': '803371', 'is_used': False, 'source': 'store', 'air': 0.9},
{'id': '803371', 'is_used': False, 'source': 'donation', 'air': 0.20},
{'id': '30042', 'is_used': False, 'source': 'donation', 'air': 0.75},
{'id': '803371', 'is_used': False, 'source': 'store', 'air': 1}]

result = {}

for e in list_balls:
    if e['id'] not in result or (
          (e['air'], e['source'] == 'store') > 
          (result[e['id']]['air'], result[e['id']]['source'] =='store')
        ):
        result[e['id']] = e

result_list = list(result.values())

print(result_list)

Displays 显示器

[{'id': '803371', 'is_used': False, 'source': 'store', 'air': 1}, {'id': '30042', 'is_used': False, 'source': 'donation', 'air': 0.75}]

You can compare directly tuples to compare on multiple criterion. 您可以直接比较元组以在多个条件下进行比较。 Notice that True is always > False (1>0) 请注意,True始终> False(1> 0)


Speed execution compared to groupby and defaultdict solutions: 与groupby和defaultdict解决方案相比,执行速度更快:

import random
from collections import defauldict
from itertools import groupby

list_balls = []
for _ in range(10000000):
    list_balls.append(
        {
            'source': random.choice(['store', 'donation']),
            'id': random.randint(0,10000),
            'air': random.randint(0,4)
        }
    )

def vanilla_filter_list(list_balls):
    result = {}

    for e in list_balls:
        if e['id'] not in result or (
              (e['air'], e['source'] == 'store') > 
              (result[e['id']]['air'], result[e['id']]['source'] =='store')
            ):
            result[e['id']] = e

    return list(result.values())

def groupby_filter_list(list_balls):
    return [max(list(v), 
                key=lambda x: (x["air"], x["source"] == "store")) for k, v in groupby(
        sorted(list_balls, key=lambda x: x["id"]),
        lambda x: x["id"])]

def collections_filter_list(list_balls):
    d = defaultdict(list)
    for ball in list_balls:
        d[ball["id"]].append(ball)

    return [
        max(group, key=lambda x: (x["air"], x["source"] == "store")) for group in d.values()
    ]

%%time
vanilla_filter_list(list_balls) # 5.52s

%%time
groupby_filter_list(list_balls) #14.3s

%%time
collections_filter_list(list_balls) #8.41s

Try this : 尝试这个 :

all_id = set(i['id'] for i in list_balls)
new_list_ballls = []
for id_ in all_id:
    max_air = max(i['air'] for i in list_balls if i['id']==id_)
    max_air_count = sum(1 for i in list_balls if i['air']==max_air and i['id']==id_)
    if max_air_count==1:
        for i in list_balls:
            if i['id']==id_ and i['air']==max_air:
                new_list_ballls.append(i)
    else:
        for i in list_balls:
            if i['id']==id_ and i['air']==max_air and i['source'] != 'store':
                new_list_ballls.append(i)

Output : 输出

[{'id': '30042', 'is_used': False, 'source': 'donation', 'air': 0.75}, 
{'id': '803371', 'is_used': False, 'source': 'store', 'air': 1}]

Here 这里

from collections import defaultdict

list_balls = [{'id': '803371', 'is_used': False, 'source': 'store', 'air': 0.9},
              {'id': '803371', 'is_used': False, 'source': 'donation', 'air': 0.20},
              {'id': '30042', 'is_used': False, 'source': 'donation', 'air': 0.75},
              {'id': '803371', 'is_used': False, 'source': 'store', 'air': 1}]

grouped_data = defaultdict(list)

for entry in list_balls:
    grouped_data[entry['id']].append(entry)

final_list = []

for k, v in grouped_data.items():
    if len(v) == 1:
        final_list.append(v[0])
    else:
        # sort by air
        x = sorted(v, key=lambda k1: k1['air'], reverse=True)
        if x[0]['air'] != x[1]['air']:
            final_list.append(x[0])
        else:
            # decide by source
            if [x[0]]['source'] == 'store':
                final_list.append(x[0])
            elif [x[1]]['source'] == 'store':
                final_list.append(x[1])

for entry in final_list:
    print(entry)

output 产量

{'id': '803371', 'is_used': False, 'source': 'store', 'air': 1}
{'id': '30042', 'is_used': False, 'source': 'donation', 'air': 0.75}

I would first group by id with a defaultdict then get the maximum dictionary by air afterwards. 我会通过第一组id ,然后得到一个defaultdict由最大字典air之后。 If a tie occurs with air and id , then use source as a secondary key for max() . 如果将airid绑定在一起,则将source用作max()的辅助key

Demo: 演示:

from collections import defaultdict

list_balls = [
    {"id": "803371", "is_used": False, "source": "store", "air": 0.9},
    {"id": "803371", "is_used": False, "source": "donation", "air": 0.20},
    {"id": "30042", "is_used": False, "source": "donation", "air": 0.75},
    {"id": "803371", "is_used": False, "source": "store", "air": 1},
    {"id": "803371", "is_used": False, "source": "donation", "air": 1},
]

d = defaultdict(list)
for ball in list_balls:
    d[ball["id"]].append(ball)

result = [
    max(group, key=lambda x: (x["air"], x["source"] == "store")) for group in d.values()
]

print(result)

Output: 输出:

[{'id': '803371', 'is_used': False, 'source': 'store', 'air': 1}, {'id': '30042', 'is_used': False, 'source': 'donation', 'air': 0.75}]

Nothing needless, only a pure python, almost. 没什么,几乎只有一个纯python。
Sort the list of dictionaries by id , then by negative values of air so that the largest ones go first, and then by source so that the entries with store go first. id排序字典列表,然后按air的负值排序,以便最大的排序,然后按source排序,使带有store的条目优先。 After that, the first entry is selected from each set of dictionaries, which are grouped by id . 之后,从每个字典集中选择第一个条目,这些字典按id分组。

import pprint

list_balls = [
  {'id': '803371', 'is_used': False, 'source': 'store', 'air': 0.9},
  {'id': '803371', 'is_used': False, 'source': 'donation', 'air': 0.20},
  {'id': '30042', 'is_used': False, 'source': 'donation', 'air': 0.75},
  {'id': '803371', 'is_used': False, 'source': 'store', 'air': 1}
]
list_balls.sort(key=lambda k: (k['id'], -k['air'], 0 if k['source'] == 'store' else 1))
pprint.pprint([d for i, d in enumerate(list_balls) if i == 0 or list_balls[i - 1]['id'] != d['id']])

Output: 输出:

[{'air': 0.75, 'id': '30042', 'is_used': False, 'source': 'donation'},
 {'air': 1, 'id': '803371', 'is_used': False, 'source': 'store'}]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM