简体   繁体   English

Pandas groupby并在列表中获得dict

[英]Pandas groupby and get dict in list

I'm trying to extract grouped row data to use values to plot it with label colors another file. 我正在尝试提取分组行数据以使用值将标签颜色绘制为另一个文件。

my dataframe is like below. 我的数据框如下所示。

df = pd.DataFrame({'x': [1, 4, 5], 'y': [3, 2, 5], 'label': [1.0, 1.0, 2.0]})

    x   y   label
0   1   3   1.0
1   4   2   1.0
2   5   5   2.0

I want to get group of label list like 我想获得一组标签列表

{'1.0': [{'index': 0, 'x': 1, 'y': 3}, {'index': 1, 'x': 4, 'y': 2}],
 '2.0': [{'index': 2, 'x': 5, 'y': 5}]}

How to do this? 这个怎么做?

df = pd.DataFrame({'x': [1, 4, 5], 'y': [3, 2, 5], 'label': [1.0, 1.0, 2.0]})
df['index'] = df.index
df
   label  x  y  index
0    1.0  1  3      0
1    1.0  4  2      1
2    2.0  5  5      2

df['dict']=df[['x','y','index']].to_dict("records")
df
   label  x  y  index                             dict
0    1.0  1  3      0  {u'y': 3, u'x': 1, u'index': 0}
1    1.0  4  2      1  {u'y': 2, u'x': 4, u'index': 1}
2    2.0  5  5      2  {u'y': 5, u'x': 5, u'index': 2}

df = df[['label','dict']]
df['label'] = df['label'].apply(str) #Converting integer column 'label' to string
df = df.groupby('label')['dict'].apply(list) 
desired_dict = df.to_dict()
desired_dict 
    {'1.0': [{'index': 0, 'x': 1, 'y': 3}, {'index': 1, 'x': 4, 'y': 2}],
     '2.0': [{'index': 2, 'x': 5, 'y': 5}]}

You can use collections.defaultdict with to_dict : 您可以将collections.defaultdictto_dict一起to_dict

from collections import defaultdict

# add 'index' series
df = df.reset_index()

# initialise defaultdict
dd = defaultdict(list)

# iterate and append
for d in df.to_dict('records'):
    dd[d['label']].append(d)

Result: 结果:

print(dd)

defaultdict(list,
            {1.0: [{'index': 0.0, 'x': 1.0, 'y': 3.0, 'label': 1.0},
                   {'index': 1.0, 'x': 4.0, 'y': 2.0, 'label': 1.0}],
             2.0: [{'index': 2.0, 'x': 5.0, 'y': 5.0, 'label': 2.0}]})

In general, there's no need to convert back to a regular dict , since defaultdict is a subclass of dict . 通常,没有必要转换回常规dict ,因为defaultdictdict的子类。

You can use itertuples and defulatdict : 你可以使用itertuplesdefulatdict

itertuples returns named tuples to iterate over dataframe: itertuples返回命名元组以迭代数据帧:

for row in df.itertuples():
    print(row)
Pandas(Index=0, x=1, y=3, label=1.0)
Pandas(Index=1, x=4, y=2, label=1.0)
Pandas(Index=2, x=5, y=5, label=2.0)

So taking advantage of this: 所以利用这个:

from collections import defaultdict
dictionary = defaultdict(list)
for row in df.itertuples():
    dummy['x'] = row.x
    dummy['y'] = row.y
    dummy['index'] = row.Index
    dictionary[row.label].append(dummy)

dict(dictionary)
> {1.0: [{'x': 1, 'y': 3, 'index': 0}, {'x': 4, 'y': 2, 'index': 1}],
 2.0: [{'x': 5, 'y': 5, 'index': 2}]}

The quickest solution for what you want is almost along what @cph_sto offers, 您想要的最快的解决方案几乎与@cph_sto提供的一样,

>>> df.reset_index().to_dict('records')
[{'index': 0.0, 'label': 1.0, 'x': 1.0, 'y': 3.0}, {'index': 1.0, 'label': 1.0, 'x': 4.0, 'y': 2.0}, {'index': 2.0, 'label': 2.0, 'x': 5.0, 'y': 5.0}]

That is, convert the index to a regular column, then apply the records version of to_dict . 也就是说,将索引转换为常规列,然后应用to_dictrecords版本。 Another option of interest: 感兴趣的另一种选择:

>>> df.to_dict('index')
{0: {'label': 1.0, 'x': 1.0, 'y': 3.0}, 1: {'label': 1.0, 'x': 4.0, 'y': 2.0}, 2: {'label': 2.0, 'x': 5.0, 'y': 5.0}}

Check the help on to_dict for more. 有关更多信息,请查看to_dict上的帮助。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM