使用groupby pandas计算行数

Question

I had the following function in pandas 0.17: 我在pandas 0.17中有以下功能：

df['numberrows'] = df.groupby(['column1','column2','column3'], as_index=False)[['column1']].transform('count').astype('int')

But I upgraded pandas today and now I get the error: 但我今天升级了大熊猫，现在我收到了错误：

  File "/usr/local/lib/python3.4/dist-packages/pandas/core/internals.py", 
line 3810, in insert raise ValueError('cannot insert {}, already exists'.format(item)) 第3810行，在插入中引发ValueError（'无法插入{}，已存在'.format（item））

ValueError: cannot insert column1, already exists ValueError：无法插入column1，已存在

What has changed in the update which causes this function to not work anymore? 更新中发生了哪些变化导致此功能不再起作用？

I want to groupby the columns and add a column which has the amount or rows of the groupby. 我想分组列并添加一个包含groupby数量或列的列。

If what I did before was not a good function, another way of grouping while getting the amount of rows that were grouped is also welcome. 如果我之前做的不是一个好的功能，那么在获得分组的行数量时，另一种分组方式也是受欢迎的。

EDIT: 编辑：

small dataset: 小数据集：

    column1  column2    column3   
 0  test     car1       1           
 1  test2    car5       2         
 2  test     car1       1         
 3  test4    car2       1      
 4  test2    car1       1

outcome would be: 结果将是：

    column1  column2    column3   numberrows
 0  test     car1       1           2
 1  test2    car5       2           1     
 3  test4    car2       1           1
 4  test2    car1       1           1

Answer 1

Consider the following approach: 考虑以下方法：

In [18]: df['new'] = df.groupby(['column1','column2','column3'])['column1'] \
                       .transform('count')

In [19]: df
Out[19]:
  column1 column2  column3  new
0    test    car1        1    2
1   test2    car5        2    1
2    test    car1        1    2
3   test4    car2        1    1
4   test2    car1        1    1

UPDATE: 更新：

In [26]: df.groupby(['column1','column2','column3'])['column1'] \
           .count().reset_index(name='numberrows')
Out[26]:
  column1 column2  column3  numberrows
0    test    car1        1           2
1   test2    car1        1           1
2   test2    car5        2           1
3   test4    car2        1           1

Answer 2

Your syntax is sloppy, you are using as_index=False with transform . 你的语法很草率，你使用transform使用as_index=False 。
as_index=False will end up pushing those columns back into the dataframe proper when it finds that column1 already exists... uh-oh. as_index=False会在发现column1已经存在时最终将这些列推回到数据帧中......呃哦。 However, that is completely unnecessary as transform handles the index for you. 但是，这完全没有必要，因为transform为您处理索引。

df.groupby(
    ['column1','column2','column3']
)['column1'].transform('count').astype('int')

0    2
1    1
2    2
3    1
4    1
Name: column1, dtype: int64

Or make a new column 或者制作一个新专栏

df.assign(
    new=df.groupby(
        ['column1','column2','column3']
    )['column1'].transform('count').astype('int')
)

  column1 column2  column3  new
0    test    car1        1    2
1   test2    car5        2    1
2    test    car1        1    2
3   test4    car2        1    1
4   test2    car1        1    1

使用groupby pandas计算行数

问题描述

2 个解决方案

解决方案1
4 已采纳 2017-05-18 20:05:46

解决方案2
1 2017-05-18 20:25:26

使用groupby pandas计算行数

问题描述

2 个解决方案

解决方案1 4 已采纳 2017-05-18 20:05:46

解决方案2 1 2017-05-18 20:25:26

解决方案1
4 已采纳 2017-05-18 20:05:46

解决方案2
1 2017-05-18 20:25:26