熊猫，分组并将多个列值聚合为字典

Question

So I have a dataframe that I want to combine some rows via a group by. 所以我有一个数据框，我想通过group by合并一些行。

Sample DF, DF样本

   col_a  col_b  col_c  col_e  col_f
0      1      0      1   -1.0      2
1      1      1      3    0.0      3
2      1      2      4    NaN      3
3      2      0      3    4.0      6
4      3      0      3    4.0      2

And what I want the output to look like is this... 我希望输出看起来像这样...

df.groupby('col_a')

col_a, col_c               ...col_f
1       {0: 1, 1: 3, 2:4}     {0:2,1:3,2:3}
2       ....                 ....
3        ....               ....

Basically, group by col_a, then aggregate all the values we got for col_c through col_f, set the values into a dictionary where col_b is the dictionary key. 基本上，按col_a分组，然后通过col_f汇总我们为col_c获得的所有值，将这些值设置为字典，其中col_b是字典键。

Not sure if there's a way to use groupby and maybe some kind of agg function or if I'm just resigned to writing a python function that takes the dataframe and just iterates over every row and using .apply. 不知道是否有一种使用groupby的方法，也许还有某种agg函数的方法，或者我是否只是辞职去编写一个使用dataframe并遍历每一行并使用.apply的python函数。 Ideas? 有想法吗？

Edit: 编辑：

Original:
       col_a  col_b  col_c  col_e  col_f
    0      1      A     1   -1.0      2
    1      1      B      3    0.0      3
    2      1      C      4    NaN      3
    3      2      A      3    4.0      6
    4      3      A      3    4.0      2

Desired:
    col_a, col_c               ...col_f
    1       {A: 1, B: 3, C:4}     {A:2,B:3,C:3}
    2       ....                 ....
    3        {A:3}               {A:2}

Answer 1

I don't think you want to do this, rarely is there a need for a DataFrame of dicts. 我认为您不想这样做，很少需要有一个dicts的DataFrame。 You can do all the same operations (and more) using a DataFrame with these as index/columns in a MultiIndex: 您可以使用DataFrame进行所有相同（或更多）操作，并将这些作为MultiIndex中的索引/列：

In [11]: res = df.set_index(["col_a", "col_b"])

In [12]: res
Out[11]:
             col_c  col_e  col_f
col_a col_b
1     0          1   -1.0      2
      1          3    0.0      3
      2          4    NaN      3
2     0          3    4.0      6
3     0          3    4.0      2

Now you can access into the DataFrame by col_a, col_b and any other column (as if that were a dict). 现在，您可以通过col_a，col_b和任何其他列（就好像是字典）访问DataFrame。

In [13]: res.loc[(1, 2), "col_c"]
Out[13]: 4.0

In [14]: res.loc[1, "col_c"]
Out[14]:
col_b
0    1
1    3
2    4
Name: col_c, dtype: int64

etc. This is going to be more efficient that using a dict inside a DataFrame... 等等。这将比在DataFrame中使用dict更高效。

熊猫，分组并将多个列值聚合为字典

问题描述

1 个解决方案

解决方案1
2 2019-02-08 00:17:30

熊猫，分组并将多个列值聚合为字典

问题描述

1 个解决方案

解决方案1 2 2019-02-08 00:17:30

解决方案1
2 2019-02-08 00:17:30