[英]Write loop into pythonic way
I have a dict array that stores audits for tickets. 我有一个dict数组,用于存储票证的审核。 Each audit has an information of
user_id
, date
that happens changes and a list of events
and each event has a few attributes like type
, field name
, among others. 每次审核都具有
user_id
信息,发生更改的date
和list of events
并且每个事件具有一些属性,如type
, field name
等等。
Based on those informations, I need to extract events information based on date
and convert to another dict. 基于这些信息,我需要根据
date
提取事件信息,然后转换为另一个字典。 Note: I need to keep only the last event for each field_name
. 注意:我只需要为每个
field_name
保留最后一个事件。
I've wrote a "super" loop that does what I need but this code looks pretty weird and not optmized: 我写了一个“超级”循环来满足我的需要,但是这段代码看起来很怪异,并且没有被优化:
dict sample: 字典样本:
data = {
"audits": [
"id": 1234,
"ticket_id": 1111,
"created_at": "2019-04-07T01:09:40Z",
"author_id": 9876543,
"events": [{
"id": 1234,
"type": "Random"
},
{
"id": 765456,
"type": "Create",
"value": "Lovely form",
"field_name": "subject"
},
{
"id": 356765,
"type": "Create",
"value": None,
"field_name": "priority"
},
{
"id": 2345432,
"type": "Change",
"value": "normal",
"field_name": "priority",
"previous_value": None
}
]
}
]
}
code: 码:
field_history = []
for audit in data['audits']:
user_id = audit['author_id']
updated = audit['created_at']
base_info = {
'user_id': user_id,
'updated': updated
}
# Iterate to get distinct value (last found on dict)
fields = [d for d in audit['events'] if (d['type'] == 'Create' or d['type'] == 'Change') and d['field_name'] != 'tags']
updated_fields = [] # this list is being used to keep history by updated
for field in fields:
distincts = [d for d in audit['events'] if d.get('field_name', '') == field['field_name']]
distinct = distincts[-1]
# remove older values and keep only the last one found on list
updated_fields[:] = [d for d in updated_fields if d['updated'] == updated and d.get('field_name') != distinct['field_name']]
updated_fields.append({**base_info, **distinct}) # add always the last element on list
field_history = field_history + updated_fields
What is the proper way to write this loop making it optimized to handle large datasets? 编写此循环以使其优化以处理大型数据集的正确方法是什么?
I like to start by making some simple functions to handle the transformations and filtering to allow the top level to remain clean: 我喜欢从制作一些简单的函数开始以处理转换和过滤,以使顶层保持干净:
def event_valid(event):
return (
event['type'] in ('Create', 'Change')
and event['field_name'] not in ('tags',)
)
events = [event for event in audit['events'] if event_valid(event)]
# Assuming the list is ordered... If not then sort it before next statement
# This trick filters to only the latest event for each distinct field_name
events = {
event['field_name']: event for event in events
}.values()
return {
'user_id': audit['author_id'],
'updated': audit['created_at'],
'events': events,
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.