有什么pythonic方式可以清除此词典列表？

Question

Hello and thanks for your help. 您好，感谢您的帮助。 I have a list of dictionaries that looks like this: 我有一个字典列表，看起来像这样：

list_balls = [{'id': '803371', 'is_used': False, 'source': 'store', 'air': 0.9},
{'id': '803371', 'is_used': False, 'source': 'donation', 'air': 0.20},
{'id': '30042', 'is_used': False, 'source': 'donation', 'air': 0.75},
{'id': '803371', 'is_used': False, 'source': 'store', 'air': 1}]

I need to clean this list leaving unique list of dictionaries. 我需要清理此列表，以保留词典的唯一列表。 If there is two entries or more with the same Id, I need to pick the one with the highest value on air. 如果有两个或两个以上具有相同ID的条目，我需要选择一个空中值最高的条目。 If they have equal values on air and ids, I need to leave the one where source == 'store'. 如果它们的air和id具有相等的值，我需要将其保留为source =='store'。 Therefore, the result in this case would be 因此，这种情况下的结果将是

list_balls = [{'id': '30042', 'is_used': False, 'source': 'donation', 'air': 0.75},
{'id': '803371', 'is_used': False, 'source': 'store', 'air': 1}]

I tried the following code to flag as keep = False for the ones that need to be taken out but it only works when there is two duplicates: 我尝试使用以下代码将需要取出的代码标记为keep = False，但仅在有两个重复项时才有效：

for i in range(0, len(list_balls )):
    if len(list_balls ) > 1:
        #print(list_balls [i])
        for j in range(1, len(list_balls )):
            if (list_balls [i]['id'] == list_balls [j]['id']):
                if (list_balls [i]['air'] > list_balls [j]['air']):
                    list_balls [i]['keep'] = True
                    list_balls [j]['keep'] = False
print(list_pns)

I assume this double for loop is not the most efficient way to do this either so any other ideas are welcome. 我认为double for循环也不是执行此操作的最有效方法，因此欢迎其他任何想法。 Thanks for your help 谢谢你的帮助

Answer 1

Using itertools.groupby 使用itertools.groupby

Ex: 例如：

from itertools import groupby
list_balls = [{'source': 'store', 'air': 0.9, 'id': '803371', 'is_used': False}, {'source': 'donation', 'air': 0.2, 'id': '803371', 'is_used': False}, {'source': 'donation', 'air': 0.75, 'id': '30042', 'is_used': False}, {'source': 'store', 'air': 1, 'id': '803371', 'is_used': False}]


#result = [max(list(v), key=lambda x: x["air"]) for k, v in groupby(sorted(list_balls, key=lambda x: x["id"]), lambda x: x["id"])]
result = [max(list(v), key=lambda x: (x["air"], x["source"] == "store")) for k, v in groupby(sorted(list_balls, key=lambda x: x["id"]), lambda x: x["id"])]
print(result)

Output: 输出：

[{'air': 0.75, 'id': '30042', 'is_used': False, 'source': 'donation'},
 {'air': 1, 'id': '803371', 'is_used': False, 'source': 'store'}]

Answer 2

Simply with something like this : 简单地像这样：

list_balls = [{'id': '803371', 'is_used': False, 'source': 'store', 'air': 0.9},
{'id': '803371', 'is_used': False, 'source': 'donation', 'air': 0.20},
{'id': '30042', 'is_used': False, 'source': 'donation', 'air': 0.75},
{'id': '803371', 'is_used': False, 'source': 'store', 'air': 1}]

result = {}

for e in list_balls:
    if e['id'] not in result or (
          (e['air'], e['source'] == 'store') > 
          (result[e['id']]['air'], result[e['id']]['source'] =='store')
        ):
        result[e['id']] = e

result_list = list(result.values())

print(result_list)

Displays 显示器

[{'id': '803371', 'is_used': False, 'source': 'store', 'air': 1}, {'id': '30042', 'is_used': False, 'source': 'donation', 'air': 0.75}]

You can compare directly tuples to compare on multiple criterion. 您可以直接比较元组以在多个条件下进行比较。 Notice that True is always > False (1>0) 请注意，True始终> False（1> 0）

Speed execution compared to groupby and defaultdict solutions: 与groupby和defaultdict解决方案相比，执行速度更快：

import random
from collections import defauldict
from itertools import groupby

list_balls = []
for _ in range(10000000):
    list_balls.append(
        {
            'source': random.choice(['store', 'donation']),
            'id': random.randint(0,10000),
            'air': random.randint(0,4)
        }
    )

def vanilla_filter_list(list_balls):
    result = {}

    for e in list_balls:
        if e['id'] not in result or (
              (e['air'], e['source'] == 'store') > 
              (result[e['id']]['air'], result[e['id']]['source'] =='store')
            ):
            result[e['id']] = e

    return list(result.values())

def groupby_filter_list(list_balls):
    return [max(list(v), 
                key=lambda x: (x["air"], x["source"] == "store")) for k, v in groupby(
        sorted(list_balls, key=lambda x: x["id"]),
        lambda x: x["id"])]

def collections_filter_list(list_balls):
    d = defaultdict(list)
    for ball in list_balls:
        d[ball["id"]].append(ball)

    return [
        max(group, key=lambda x: (x["air"], x["source"] == "store")) for group in d.values()
    ]

%%time
vanilla_filter_list(list_balls) # 5.52s

%%time
groupby_filter_list(list_balls) #14.3s

%%time
collections_filter_list(list_balls) #8.41s

Answer 3

Try this : 尝试这个：

all_id = set(i['id'] for i in list_balls)
new_list_ballls = []
for id_ in all_id:
    max_air = max(i['air'] for i in list_balls if i['id']==id_)
    max_air_count = sum(1 for i in list_balls if i['air']==max_air and i['id']==id_)
    if max_air_count==1:
        for i in list_balls:
            if i['id']==id_ and i['air']==max_air:
                new_list_ballls.append(i)
    else:
        for i in list_balls:
            if i['id']==id_ and i['air']==max_air and i['source'] != 'store':
                new_list_ballls.append(i)

Output : 输出：

[{'id': '30042', 'is_used': False, 'source': 'donation', 'air': 0.75}, 
{'id': '803371', 'is_used': False, 'source': 'store', 'air': 1}]

Answer 4

Here 这里

from collections import defaultdict

list_balls = [{'id': '803371', 'is_used': False, 'source': 'store', 'air': 0.9},
              {'id': '803371', 'is_used': False, 'source': 'donation', 'air': 0.20},
              {'id': '30042', 'is_used': False, 'source': 'donation', 'air': 0.75},
              {'id': '803371', 'is_used': False, 'source': 'store', 'air': 1}]

grouped_data = defaultdict(list)

for entry in list_balls:
    grouped_data[entry['id']].append(entry)

final_list = []

for k, v in grouped_data.items():
    if len(v) == 1:
        final_list.append(v[0])
    else:
        # sort by air
        x = sorted(v, key=lambda k1: k1['air'], reverse=True)
        if x[0]['air'] != x[1]['air']:
            final_list.append(x[0])
        else:
            # decide by source
            if [x[0]]['source'] == 'store':
                final_list.append(x[0])
            elif [x[1]]['source'] == 'store':
                final_list.append(x[1])

for entry in final_list:
    print(entry)

output 产量

{'id': '803371', 'is_used': False, 'source': 'store', 'air': 1}
{'id': '30042', 'is_used': False, 'source': 'donation', 'air': 0.75}

Answer 5

I would first group by id with a defaultdict then get the maximum dictionary by air afterwards. 我会通过第一组id ，然后得到一个defaultdict由最大字典air之后。 If a tie occurs with air and id , then use source as a secondary key for max() . 如果将air和id绑定在一起，则将source用作max()的辅助key 。

Demo: 演示：

from collections import defaultdict

list_balls = [
    {"id": "803371", "is_used": False, "source": "store", "air": 0.9},
    {"id": "803371", "is_used": False, "source": "donation", "air": 0.20},
    {"id": "30042", "is_used": False, "source": "donation", "air": 0.75},
    {"id": "803371", "is_used": False, "source": "store", "air": 1},
    {"id": "803371", "is_used": False, "source": "donation", "air": 1},
]

d = defaultdict(list)
for ball in list_balls:
    d[ball["id"]].append(ball)

result = [
    max(group, key=lambda x: (x["air"], x["source"] == "store")) for group in d.values()
]

print(result)

Output: 输出：

[{'id': '803371', 'is_used': False, 'source': 'store', 'air': 1}, {'id': '30042', 'is_used': False, 'source': 'donation', 'air': 0.75}]

Answer 6

Nothing needless, only a pure python, almost. 没什么，几乎只有一个纯python。
Sort the list of dictionaries by id , then by negative values of air so that the largest ones go first, and then by source so that the entries with store go first. 按id排序字典列表，然后按air的负值排序，以便最大的排序，然后按source排序，使带有store的条目优先。 After that, the first entry is selected from each set of dictionaries, which are grouped by id . 之后，从每个字典集中选择第一个条目，这些字典按id分组。

import pprint

list_balls = [
  {'id': '803371', 'is_used': False, 'source': 'store', 'air': 0.9},
  {'id': '803371', 'is_used': False, 'source': 'donation', 'air': 0.20},
  {'id': '30042', 'is_used': False, 'source': 'donation', 'air': 0.75},
  {'id': '803371', 'is_used': False, 'source': 'store', 'air': 1}
]
list_balls.sort(key=lambda k: (k['id'], -k['air'], 0 if k['source'] == 'store' else 1))
pprint.pprint([d for i, d in enumerate(list_balls) if i == 0 or list_balls[i - 1]['id'] != d['id']])

Output: 输出：

[{'air': 0.75, 'id': '30042', 'is_used': False, 'source': 'donation'},
 {'air': 1, 'id': '803371', 'is_used': False, 'source': 'store'}]

有什么pythonic方式可以清除此词典列表？

问题描述

6 个解决方案

解决方案1
1 2019-06-20 12:27:52

解决方案2
1 2019-06-20 12:36:07

解决方案3
0 2019-06-20 12:36:22

解决方案4
0 2019-06-20 12:39:49

解决方案5
0 2019-06-20 13:01:27

解决方案6
0 2019-06-20 14:43:22

有什么pythonic方式可以清除此词典列表？

问题描述

6 个解决方案

解决方案1 1 2019-06-20 12:27:52

解决方案2 1 2019-06-20 12:36:07

解决方案3 0 2019-06-20 12:36:22

解决方案4 0 2019-06-20 12:39:49

解决方案5 0 2019-06-20 13:01:27

解决方案6 0 2019-06-20 14:43:22

解决方案1
1 2019-06-20 12:27:52

解决方案2
1 2019-06-20 12:36:07

解决方案3
0 2019-06-20 12:36:22

解决方案4
0 2019-06-20 12:39:49

解决方案5
0 2019-06-20 13:01:27

解决方案6
0 2019-06-20 14:43:22