[英]How to group a list of dict into sub-lists using pandas?
輸入是類似的
[
{"name": "person 1", "age": 20, "type": "student"},
{"name": "person 2", "age": 19, "type": "worker"},
{"name": "person 3", "age": 30, "type": "student"},
{"name": "person 4", "age": 25, "type": "worker"},
{"name": "person 5", "age": 17, "type": "student"}
]
當按“類型”字段分組時,應該是所需的輸出
[
[
{"name": "person 1", "age": 20, "type": "student"},
{"name": "person 3", "age": 30, "type": "student"},
{"name": "person 5", "age": 17, "type": "student"}
],
[
{"name": "person 2", "age": 19, "type": "worker"},
{"name": "person 4", "age": 25, "type": "worker"}
]
]
我有以下代碼用itertools來做
from itertools import groupby
input = [
{"name": "person 1", "age": 20, "type": "student"},
{"name": "person 2", "age": 19, "type": "worker"},
{"name": "person 3", "age": 30, "type": "student"},
{"name": "person 4", "age": 25, "type": "worker"},
{"name": "person 5", "age": 17, "type": "student"}
]
input.sort(key=lambda x: x["type"])
output = [list(v) for k, v in groupby(input, key=lambda x: x["type"])]
這正確地給出了結果。 然而,對於更大量的數據,我認為使用pandas應該更有效,但現在看來我無法弄清楚如何使用pandas完成上述操作。 我現在的代碼有點工作,但我認為它根本沒有效率。
import pandas as pd
input = [
{"name": "person 1", "age": 20, "type": "student"},
{"name": "person 2", "age": 19, "type": "worker"},
{"name": "person 3", "age": 30, "type": "student"},
{"name": "person 4", "age": 25, "type": "worker"},
{"name": "person 5", "age": 17, "type": "student"}
]
indexes = [list(v) for k, v in pd.DataFrame(input).groupby(["type"]).groups.items()]
output = [[input[y] for y in x] for x in indexes]
我很確定上面的代碼是使用pandas groupby功能的一種非常錯誤的方法,所以任何有關如何正確執行此操作的幫助? 謝謝。
您可以使用GroupBy.apply
和to_dict
執行此to_dict
:
pd.DataFrame(input).groupby('type').apply(lambda x: x.to_dict('r')).to_list()
稍快一點,
pd.DataFrame(input).groupby('type').apply(
pd.DataFrame.to_dict, orient='r').tolist()
# [[{'age': 20, 'name': 'person 1', 'type': 'student'},
# {'age': 30, 'name': 'person 3', 'type': 'student'},
# {'age': 17, 'name': 'person 5', 'type': 'student'}],
# [{'age': 19, 'name': 'person 2', 'type': 'worker'},
# {'age': 25, 'name': 'person 4', 'type': 'worker'}]]
我將要做的
l1=[[y.iloc[0].to_dict() for z in y.iterrows()] for _ , y in pd.DataFrame(input).groupby('type')]
Out[254]:
[[{'age': 20, 'name': 'person 1', 'type': 'student'},
{'age': 20, 'name': 'person 1', 'type': 'student'},
{'age': 20, 'name': 'person 1', 'type': 'student'}],
[{'age': 19, 'name': 'person 2', 'type': 'worker'},
{'age': 19, 'name': 'person 2', 'type': 'worker'}]]
而且如果只需要與值匹配鍵,您可以使用itertuples
進行檢查
l=[list(y.itertuples()) for _ , y in pd.DataFrame(input).groupby('type')]
Out[256]:
[[Pandas(Index=0, age=20, name='person 1', type='student'),
Pandas(Index=2, age=30, name='person 3', type='student'),
Pandas(Index=4, age=17, name='person 5', type='student')],
[Pandas(Index=1, age=19, name='person 2', type='worker'),
Pandas(Index=3, age=25, name='person 4', type='worker')]]
相比
l[0][0].age
Out[263]: 20
l1[0][0]['age']
Out[264]: 20
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.