[英]Pandas groupby and get dict in list
I'm trying to extract grouped row data to use values to plot it with label colors another file. 我正在尝试提取分组行数据以使用值将标签颜色绘制为另一个文件。
my dataframe is like below. 我的数据框如下所示。
df = pd.DataFrame({'x': [1, 4, 5], 'y': [3, 2, 5], 'label': [1.0, 1.0, 2.0]})
x y label
0 1 3 1.0
1 4 2 1.0
2 5 5 2.0
I want to get group of label list like 我想获得一组标签列表
{'1.0': [{'index': 0, 'x': 1, 'y': 3}, {'index': 1, 'x': 4, 'y': 2}],
'2.0': [{'index': 2, 'x': 5, 'y': 5}]}
How to do this? 这个怎么做?
df = pd.DataFrame({'x': [1, 4, 5], 'y': [3, 2, 5], 'label': [1.0, 1.0, 2.0]})
df['index'] = df.index
df
label x y index
0 1.0 1 3 0
1 1.0 4 2 1
2 2.0 5 5 2
df['dict']=df[['x','y','index']].to_dict("records")
df
label x y index dict
0 1.0 1 3 0 {u'y': 3, u'x': 1, u'index': 0}
1 1.0 4 2 1 {u'y': 2, u'x': 4, u'index': 1}
2 2.0 5 5 2 {u'y': 5, u'x': 5, u'index': 2}
df = df[['label','dict']]
df['label'] = df['label'].apply(str) #Converting integer column 'label' to string
df = df.groupby('label')['dict'].apply(list)
desired_dict = df.to_dict()
desired_dict
{'1.0': [{'index': 0, 'x': 1, 'y': 3}, {'index': 1, 'x': 4, 'y': 2}],
'2.0': [{'index': 2, 'x': 5, 'y': 5}]}
You can use collections.defaultdict
with to_dict
: 您可以将
collections.defaultdict
与to_dict
一起to_dict
:
from collections import defaultdict
# add 'index' series
df = df.reset_index()
# initialise defaultdict
dd = defaultdict(list)
# iterate and append
for d in df.to_dict('records'):
dd[d['label']].append(d)
Result: 结果:
print(dd)
defaultdict(list,
{1.0: [{'index': 0.0, 'x': 1.0, 'y': 3.0, 'label': 1.0},
{'index': 1.0, 'x': 4.0, 'y': 2.0, 'label': 1.0}],
2.0: [{'index': 2.0, 'x': 5.0, 'y': 5.0, 'label': 2.0}]})
In general, there's no need to convert back to a regular dict
, since defaultdict
is a subclass of dict
. 通常,没有必要转换回常规
dict
,因为defaultdict
是dict
的子类。
You can use itertuples and defulatdict : 你可以使用itertuples和defulatdict :
itertuples returns named tuples to iterate over dataframe: itertuples返回命名元组以迭代数据帧:
for row in df.itertuples():
print(row)
Pandas(Index=0, x=1, y=3, label=1.0)
Pandas(Index=1, x=4, y=2, label=1.0)
Pandas(Index=2, x=5, y=5, label=2.0)
So taking advantage of this: 所以利用这个:
from collections import defaultdict
dictionary = defaultdict(list)
for row in df.itertuples():
dummy['x'] = row.x
dummy['y'] = row.y
dummy['index'] = row.Index
dictionary[row.label].append(dummy)
dict(dictionary)
> {1.0: [{'x': 1, 'y': 3, 'index': 0}, {'x': 4, 'y': 2, 'index': 1}],
2.0: [{'x': 5, 'y': 5, 'index': 2}]}
The quickest solution for what you want is almost along what @cph_sto offers, 您想要的最快的解决方案几乎与@cph_sto提供的一样,
>>> df.reset_index().to_dict('records')
[{'index': 0.0, 'label': 1.0, 'x': 1.0, 'y': 3.0}, {'index': 1.0, 'label': 1.0, 'x': 4.0, 'y': 2.0}, {'index': 2.0, 'label': 2.0, 'x': 5.0, 'y': 5.0}]
That is, convert the index to a regular column, then apply the records
version of to_dict
. 也就是说,将索引转换为常规列,然后应用
to_dict
的records
版本。 Another option of interest: 感兴趣的另一种选择:
>>> df.to_dict('index')
{0: {'label': 1.0, 'x': 1.0, 'y': 3.0}, 1: {'label': 1.0, 'x': 4.0, 'y': 2.0}, 2: {'label': 2.0, 'x': 5.0, 'y': 5.0}}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.