[英]What pythonic way is there to clean this list of dictionaries?
Hello and thanks for your help. 您好,感谢您的帮助。 I have a list of dictionaries that looks like this: 我有一个字典列表,看起来像这样:
list_balls = [{'id': '803371', 'is_used': False, 'source': 'store', 'air': 0.9},
{'id': '803371', 'is_used': False, 'source': 'donation', 'air': 0.20},
{'id': '30042', 'is_used': False, 'source': 'donation', 'air': 0.75},
{'id': '803371', 'is_used': False, 'source': 'store', 'air': 1}]
I need to clean this list leaving unique list of dictionaries. 我需要清理此列表,以保留词典的唯一列表。 If there is two entries or more with the same Id, I need to pick the one with the highest value on air. 如果有两个或两个以上具有相同ID的条目,我需要选择一个空中值最高的条目。 If they have equal values on air and ids, I need to leave the one where source == 'store'. 如果它们的air和id具有相等的值,我需要将其保留为source =='store'。 Therefore, the result in this case would be 因此,这种情况下的结果将是
list_balls = [{'id': '30042', 'is_used': False, 'source': 'donation', 'air': 0.75},
{'id': '803371', 'is_used': False, 'source': 'store', 'air': 1}]
I tried the following code to flag as keep = False for the ones that need to be taken out but it only works when there is two duplicates: 我尝试使用以下代码将需要取出的代码标记为keep = False,但仅在有两个重复项时才有效:
for i in range(0, len(list_balls )):
if len(list_balls ) > 1:
#print(list_balls [i])
for j in range(1, len(list_balls )):
if (list_balls [i]['id'] == list_balls [j]['id']):
if (list_balls [i]['air'] > list_balls [j]['air']):
list_balls [i]['keep'] = True
list_balls [j]['keep'] = False
print(list_pns)
I assume this double for loop is not the most efficient way to do this either so any other ideas are welcome. 我认为double for循环也不是执行此操作的最有效方法,因此欢迎其他任何想法。 Thanks for your help 谢谢你的帮助
Using itertools.groupby
使用itertools.groupby
Ex: 例如:
from itertools import groupby
list_balls = [{'source': 'store', 'air': 0.9, 'id': '803371', 'is_used': False}, {'source': 'donation', 'air': 0.2, 'id': '803371', 'is_used': False}, {'source': 'donation', 'air': 0.75, 'id': '30042', 'is_used': False}, {'source': 'store', 'air': 1, 'id': '803371', 'is_used': False}]
#result = [max(list(v), key=lambda x: x["air"]) for k, v in groupby(sorted(list_balls, key=lambda x: x["id"]), lambda x: x["id"])]
result = [max(list(v), key=lambda x: (x["air"], x["source"] == "store")) for k, v in groupby(sorted(list_balls, key=lambda x: x["id"]), lambda x: x["id"])]
print(result)
Output: 输出:
[{'air': 0.75, 'id': '30042', 'is_used': False, 'source': 'donation'},
{'air': 1, 'id': '803371', 'is_used': False, 'source': 'store'}]
Simply with something like this : 简单地像这样:
list_balls = [{'id': '803371', 'is_used': False, 'source': 'store', 'air': 0.9},
{'id': '803371', 'is_used': False, 'source': 'donation', 'air': 0.20},
{'id': '30042', 'is_used': False, 'source': 'donation', 'air': 0.75},
{'id': '803371', 'is_used': False, 'source': 'store', 'air': 1}]
result = {}
for e in list_balls:
if e['id'] not in result or (
(e['air'], e['source'] == 'store') >
(result[e['id']]['air'], result[e['id']]['source'] =='store')
):
result[e['id']] = e
result_list = list(result.values())
print(result_list)
Displays 显示器
[{'id': '803371', 'is_used': False, 'source': 'store', 'air': 1}, {'id': '30042', 'is_used': False, 'source': 'donation', 'air': 0.75}]
You can compare directly tuples to compare on multiple criterion. 您可以直接比较元组以在多个条件下进行比较。 Notice that True is always > False (1>0) 请注意,True始终> False(1> 0)
Speed execution compared to groupby and defaultdict solutions: 与groupby和defaultdict解决方案相比,执行速度更快:
import random
from collections import defauldict
from itertools import groupby
list_balls = []
for _ in range(10000000):
list_balls.append(
{
'source': random.choice(['store', 'donation']),
'id': random.randint(0,10000),
'air': random.randint(0,4)
}
)
def vanilla_filter_list(list_balls):
result = {}
for e in list_balls:
if e['id'] not in result or (
(e['air'], e['source'] == 'store') >
(result[e['id']]['air'], result[e['id']]['source'] =='store')
):
result[e['id']] = e
return list(result.values())
def groupby_filter_list(list_balls):
return [max(list(v),
key=lambda x: (x["air"], x["source"] == "store")) for k, v in groupby(
sorted(list_balls, key=lambda x: x["id"]),
lambda x: x["id"])]
def collections_filter_list(list_balls):
d = defaultdict(list)
for ball in list_balls:
d[ball["id"]].append(ball)
return [
max(group, key=lambda x: (x["air"], x["source"] == "store")) for group in d.values()
]
%%time
vanilla_filter_list(list_balls) # 5.52s
%%time
groupby_filter_list(list_balls) #14.3s
%%time
collections_filter_list(list_balls) #8.41s
Try this : 尝试这个 :
all_id = set(i['id'] for i in list_balls)
new_list_ballls = []
for id_ in all_id:
max_air = max(i['air'] for i in list_balls if i['id']==id_)
max_air_count = sum(1 for i in list_balls if i['air']==max_air and i['id']==id_)
if max_air_count==1:
for i in list_balls:
if i['id']==id_ and i['air']==max_air:
new_list_ballls.append(i)
else:
for i in list_balls:
if i['id']==id_ and i['air']==max_air and i['source'] != 'store':
new_list_ballls.append(i)
Output : 输出 :
[{'id': '30042', 'is_used': False, 'source': 'donation', 'air': 0.75},
{'id': '803371', 'is_used': False, 'source': 'store', 'air': 1}]
Here 这里
from collections import defaultdict
list_balls = [{'id': '803371', 'is_used': False, 'source': 'store', 'air': 0.9},
{'id': '803371', 'is_used': False, 'source': 'donation', 'air': 0.20},
{'id': '30042', 'is_used': False, 'source': 'donation', 'air': 0.75},
{'id': '803371', 'is_used': False, 'source': 'store', 'air': 1}]
grouped_data = defaultdict(list)
for entry in list_balls:
grouped_data[entry['id']].append(entry)
final_list = []
for k, v in grouped_data.items():
if len(v) == 1:
final_list.append(v[0])
else:
# sort by air
x = sorted(v, key=lambda k1: k1['air'], reverse=True)
if x[0]['air'] != x[1]['air']:
final_list.append(x[0])
else:
# decide by source
if [x[0]]['source'] == 'store':
final_list.append(x[0])
elif [x[1]]['source'] == 'store':
final_list.append(x[1])
for entry in final_list:
print(entry)
output 产量
{'id': '803371', 'is_used': False, 'source': 'store', 'air': 1}
{'id': '30042', 'is_used': False, 'source': 'donation', 'air': 0.75}
I would first group by id
with a defaultdict then get the maximum dictionary by air
afterwards. 我会通过第一组id
,然后得到一个defaultdict由最大字典air
之后。 If a tie occurs with air
and id
, then use source
as a secondary key
for max()
. 如果将air
和id
绑定在一起,则将source
用作max()
的辅助key
。
Demo: 演示:
from collections import defaultdict
list_balls = [
{"id": "803371", "is_used": False, "source": "store", "air": 0.9},
{"id": "803371", "is_used": False, "source": "donation", "air": 0.20},
{"id": "30042", "is_used": False, "source": "donation", "air": 0.75},
{"id": "803371", "is_used": False, "source": "store", "air": 1},
{"id": "803371", "is_used": False, "source": "donation", "air": 1},
]
d = defaultdict(list)
for ball in list_balls:
d[ball["id"]].append(ball)
result = [
max(group, key=lambda x: (x["air"], x["source"] == "store")) for group in d.values()
]
print(result)
Output: 输出:
[{'id': '803371', 'is_used': False, 'source': 'store', 'air': 1}, {'id': '30042', 'is_used': False, 'source': 'donation', 'air': 0.75}]
Nothing needless, only a pure python, almost. 没什么,几乎只有一个纯python。
Sort the list of dictionaries by id
, then by negative values of air
so that the largest ones go first, and then by source
so that the entries with store
go first. 按id
排序字典列表,然后按air
的负值排序,以便最大的排序,然后按source
排序,使带有store
的条目优先。 After that, the first entry is selected from each set of dictionaries, which are grouped by id
. 之后,从每个字典集中选择第一个条目,这些字典按id
分组。
import pprint
list_balls = [
{'id': '803371', 'is_used': False, 'source': 'store', 'air': 0.9},
{'id': '803371', 'is_used': False, 'source': 'donation', 'air': 0.20},
{'id': '30042', 'is_used': False, 'source': 'donation', 'air': 0.75},
{'id': '803371', 'is_used': False, 'source': 'store', 'air': 1}
]
list_balls.sort(key=lambda k: (k['id'], -k['air'], 0 if k['source'] == 'store' else 1))
pprint.pprint([d for i, d in enumerate(list_balls) if i == 0 or list_balls[i - 1]['id'] != d['id']])
Output: 输出:
[{'air': 0.75, 'id': '30042', 'is_used': False, 'source': 'donation'},
{'air': 1, 'id': '803371', 'is_used': False, 'source': 'store'}]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.