[英]count number rows with groupby pandas
I had the following function in pandas 0.17: 我在pandas 0.17中有以下功能:
df['numberrows'] = df.groupby(['column1','column2','column3'], as_index=False)[['column1']].transform('count').astype('int')
But I upgraded pandas today and now I get the error: 但我今天升级了大熊猫,现在我收到了错误:
File "/usr/local/lib/python3.4/dist-packages/pandas/core/internals.py",
line 3810, in insert raise ValueError('cannot insert {}, already exists'.format(item))
第3810行,在插入中引发ValueError('无法插入{},已存在'.format(item))
ValueError: cannot insert column1, already exists
ValueError:无法插入column1,已存在
What has changed in the update which causes this function to not work anymore? 更新中发生了哪些变化导致此功能不再起作用?
I want to groupby the columns and add a column which has the amount or rows of the groupby. 我想分组列并添加一个包含groupby数量或列的列。
If what I did before was not a good function, another way of grouping while getting the amount of rows that were grouped is also welcome. 如果我之前做的不是一个好的功能,那么在获得分组的行数量时,另一种分组方式也是受欢迎的。
EDIT: 编辑:
small dataset: 小数据集:
column1 column2 column3
0 test car1 1
1 test2 car5 2
2 test car1 1
3 test4 car2 1
4 test2 car1 1
outcome would be: 结果将是:
column1 column2 column3 numberrows
0 test car1 1 2
1 test2 car5 2 1
3 test4 car2 1 1
4 test2 car1 1 1
Consider the following approach: 考虑以下方法:
In [18]: df['new'] = df.groupby(['column1','column2','column3'])['column1'] \
.transform('count')
In [19]: df
Out[19]:
column1 column2 column3 new
0 test car1 1 2
1 test2 car5 2 1
2 test car1 1 2
3 test4 car2 1 1
4 test2 car1 1 1
UPDATE: 更新:
In [26]: df.groupby(['column1','column2','column3'])['column1'] \
.count().reset_index(name='numberrows')
Out[26]:
column1 column2 column3 numberrows
0 test car1 1 2
1 test2 car1 1 1
2 test2 car5 2 1
3 test4 car2 1 1
Your syntax is sloppy, you are using as_index=False
with transform
. 你的语法很草率,你使用
transform
使用as_index=False
。
as_index=False
will end up pushing those columns back into the dataframe proper when it finds that column1
already exists... uh-oh. as_index=False
会在发现column1
已经存在时最终将这些列推回到数据帧中......呃哦。 However, that is completely unnecessary as transform
handles the index for you. 但是,这完全没有必要,因为
transform
为您处理索引。
df.groupby(
['column1','column2','column3']
)['column1'].transform('count').astype('int')
0 2
1 1
2 2
3 1
4 1
Name: column1, dtype: int64
Or make a new column 或者制作一个新专栏
df.assign(
new=df.groupby(
['column1','column2','column3']
)['column1'].transform('count').astype('int')
)
column1 column2 column3 new
0 test car1 1 2
1 test2 car5 2 1
2 test car1 1 2
3 test4 car2 1 1
4 test2 car1 1 1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.