[英]Nested defaultdict(list) grouping
I have a block of result rows and I am trying to group them into two levels of nesting [{key: value[{key:value[]}]}]
. 我有一个结果行块,并且尝试将它们分为两个嵌套级别[{key: value[{key:value[]}]}]
。 The values are non-unique at the top level keys. 顶级键上的值不唯一。
I've been trying to use defaultdict
, but have not had success grouping at both levels given non-uniqueness. 我一直在尝试使用defaultdict
,但是由于非唯一性,在两个级别上都没有成功分组。 Iterating over the data may be better but I also haven't had success with that. 遍历数据可能会更好,但是我也没有成功。
Starting data: 起始数据:
data =
[{'Name': 'Bob', 'Time': 12, 'Place': 'Home'},
{'Name': 'Bob', 'Time': 11, 'Place': 'Home'},
{'Name': 'Jerry', 'Time': 5, 'Place': 'Home'},
{'Name': 'Jerry', 'Time': 11, 'Place': '-----'},
{'Name': 'Jerry', 'Time': 11, 'Place': 'Work'}]
Final desired data: 最终所需数据:
[{"Name": "Bob", "Details":[{"Place":"Home", "Time":[12, 11]}]},
{"Name": "Jerry", "Details":[{"Place":"Home", "Time":[5]},
{"Place":"-----", "Time":[11]},
{"Place":"Work", "Time":[11]}]}]
You could group by the Name
and Place
using itertools.groupby
, 你可以通过组的Name
和Place
使用itertools.groupby
,
>>> import itertools
>>> from collections import defaultdict
>>> data
[{'Name': 'Bob', 'Time': 12, 'Place': 'Home'}, {'Name': 'Bob', 'Time': 11, 'Place': 'Home'}, {'Name': 'Jerry', 'Time': 5, 'Place': 'Home'}, {'Name': 'Jerry', 'Time': 11, 'Place': '-----'}, {'Name': 'Jerry', 'Time': 11, 'Place': 'Work'}]
>>> sorted_data = sorted(data, key=lambda x: (x['Name'], x['Place'])) # sorting before grouping as suggested by @wwii, because The returned group is itself an iterator that shares the underlying iterable with groupby(). Please see (https://docs.python.org/3/library/itertools.html#itertools.groupby)
>>> d = defaultdict(list)
>>> y = itertools.groupby(sorted_data, lambda x: (x['Name'], x['Place']))
>>> for group, grouper in y:
... time_ = [item['Time'] for item in grouper]
... name, place = group
... d[name].append({'Place': place, 'Time': time_})
...
>>> d
defaultdict(<class 'list'>, {'Bob': [{'Place': 'Home', 'Time': [12, 11]}], 'Jerry': [{'Place': 'Home', 'Time': [5]}, {'Place': '-----', 'Time': [11]}, {'Place': 'Work', 'Time': [11]}]})
>>> pprint.pprint(dict(d))
{'Bob': [{'Place': 'Home', 'Time': [12, 11]}],
'Jerry': [{'Place': 'Home', 'Time': [5]},
{'Place': '-----', 'Time': [11]},
{'Place': 'Work', 'Time': [11]}]}
If you need the exact structure you showed then, 如果您需要显示的确切结构,
>>> f_data = []
>>> for key, value in d.items():
... f_data.append({'Name': key, 'Details': value})
...
>>> pprint.pprint(f_data)
[{'Details': [{'Place': 'Home', 'Time': [12, 11]}], 'Name': 'Bob'},
{'Details': [{'Place': '-----', 'Time': [11]},
{'Place': 'Home', 'Time': [5]},
{'Place': 'Work', 'Time': [11]}],
'Name': 'Jerry'}]
Sort the data; 排序数据; group by 'Name'
, group that result by 'Place'
; 按'Name'
分组,按'Place'
分组; extract the times. 提取时间。
import operator
name = operator.itemgetter('Name')
where = operator.itemgetter('Place')
time = operator.itemgetter('Time')
data.sort(key=lambda x: (name(x),where(x)))
result = []
for name, group in itertools.groupby(data,key=name):
d = {'Name':name, 'Details':[]}
for place, times in itertools.groupby(group,key=where):
times = map(time, times)
d['Details'].append({'Place':place, 'Time':list(times)})
result.append(d)
I like to use operator.itemgetter
instead of a lambda function if it will be used more than once. 如果要多次使用,我喜欢使用operator.itemgetter
而不是lambda函数。 Just my personal preference. 只是我个人的喜好。
I've tried to solve it with a bit of help from Pandas. 我尝试在熊猫提供的一些帮助下解决该问题。 Have a look: 看一看:
import pandas as pd
data = [{'Name': 'Bob', 'Time': 12, 'Place': 'Home'},
{'Name': 'Bob', 'Time': 11, 'Place': 'Home'},
{'Name': 'Jerry', 'Time': 5, 'Place': 'Home'},
{'Name': 'Jerry', 'Time': 11, 'Place': '-----'},
{'Name': 'Jerry', 'Time': 11, 'Place': 'Work'}]
df = pd.DataFrame.from_dict(data)
#Take the unique names only
names = df["Name"].unique()
#This list will hold the desired values
new_list = []
# Iterate over names
for n in names:
# Make subset off the data set where name is n
subset = df[df["Name"]==n]
# Get Unique Places in the subset
places = subset["Place"].unique()
# This will hold the details
details = []
# Iterate over unique places
for p in places:
# Get times from subset where place is and convert to list
times = subset[subset["Place"]==p]["Time"].tolist()
# Append to details list
details.append({"Place":p,"Time":times})
# Add the details in new_list as the format you preferred
new_list.append({"Name":n, "Details":details})
print(new_list)
You've got the right idea with defaultdict
plus iteration. 您可以通过defaultdict
加迭代获得正确的想法。 The only slightly tricky bit is making a nested defaultdict
. 唯一有些棘手的地方是嵌套了defaultdict
。
from collections import defaultdict
def timegroup(data):
grouped = defaultdict(lambda:defaultdict(list))
for d in data:
grouped[d['Name']][d['Place']].append(d['Time'])
for name, details in grouped.items():
yield {'Name': name,
'Details': [{'Place': p, 'Time': t} for p, t in details.items()]}
(I like to use generators for things like this, because sometimes you just want to iterate over the results, in which case you don't need a list, and if you do need a list it's easy to make one.) (我喜欢将生成器用于此类操作,因为有时您只想遍历结果,在这种情况下,您不需要列表,如果需要列表,则很容易创建一个列表。)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.