简体   繁体   English

如何使用 pandas 中的条件执行多个 groupby 和转换计数

[英]How to perform a multiple groupby and transform count with a condition in pandas

This is an extension of the question here: here这是这里问题的延伸: 这里

I am trying add an extra column to the grouby:我正在尝试向 grouby 添加一个额外的列:

# Import pandas library 
import pandas as pd
import numpy as np

# data
data = [['tom', 10,2,'c',100,'x'], ['tom',16 ,3,'a',100,'x'], ['tom', 22,2,'a',100,'x'],
        ['matt', 10,1,'c',100,'x'], ['matt', 15,5,'b',100,'x'], ['matt', 14,1,'b',100,'x']]

# Create the pandas DataFrame 
df = pd.DataFrame(data, columns = ['Name', 'Attempts','Score','Category','Rating','Other'])
df['AttemptsbyRating'] = df.groupby(by=['Rating','Other'])['Attempts'].transform('count')
df

在此处输入图像描述

Then i try to add another column for the sum of rows that have a Score greater than 1 (which should equal 4):然后我尝试为分数大于 1(应该等于 4)的行的总和添加另一列:

df['scoregreaterthan1'] = df['Score'].gt(1).groupby(by=df[['Rating','Other']]).transform('sum')

But i am getting a但我得到了一个

ValueError: Grouper for '<class 'pandas.core.frame.DataFrame'>' not 1-dimensional

Any ideas?有任何想法吗? thanks very much!非常感谢!

df['Score'].gt(1) is returning a boolean series rather than a dataframe. df['Score'].gt(1)返回 boolean 系列,而不是 dataframe。 You need to return a dataframe first before you can groupby the relevant columns.您需要先返回 dataframe 才能按相关列进行分组。

use:利用:

df = df[df['Score'].gt(1)]
df['scoregreaterthan1'] = df.groupby(['Rating','Other'])['Score'].transform('count')
df

output: output:

    Name    Attempts    Score   Category    Rating  Other   AttemptsbyRating    scoregreaterthan1
0   tom     10          2       c           100     x       6                4
1   tom     16          3       a           100     x       6                4
2   tom     22          2       a           100     x       6                4
4   matt    15          5       b           100     x       6                4

If you want to keep the people who have a score that is not greater than one, then instead of this:如果您想保留分数不大于 1 的人,请不要这样做:

df = df[df['Score'].gt(1)]
df['scoregreaterthan1'] = df.groupby(['Rating','Other'])['Score'].transform('count')

do this:做这个:

df['scoregreaterthan1'] = df[df['Score'].gt(1)].groupby(['Rating','Other'])['Score'].transform('count')
df['scoregreaterthan1'] = df['scoregreaterthan1'].ffill().astype(int)

output 2: output 2:

    Name    Attempts    Score   Category    Rating  Other   AttemptsbyRating    scoregreaterthan1
0   tom     10  2   c   100 x   6   4
1   tom     16  3   a   100 x   6   4
2   tom     22  2   a   100 x   6   4
3   matt    10  1   c   100 x   6   4
4   matt    15  5   b   100 x   6   4
5   matt    14  1   b   100 x   6   4

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM