简体   繁体   English

所有列的 Pandas groupby.sum

[英]Pandas groupby.sum for all columns

I have a dataset with a set of columns I want to sum for each row.我有一个数据集,其中包含一组我想为每一行求和的列。 The columns in question all follow a specific naming pattern that I have been able to group in the past via the.sum() function:有问题的列都遵循我过去能够通过 the.sum() function 分组的特定命名模式:

pd.DataFrame.sum(data.filter(regex=r'_name$'),axis=1)

Now, I need to complete this same function, but, when grouped by a value of a column:现在,我需要完成同样的 function,但是,当按列的值分组时:

data.groupby('group').sum(data.filter(regex=r'_name$'),axis=1)

However, this does not appear to work as the.sum() function now does not expect any filtered columns.但是,这似乎不起作用,因为 .sum() function 现在不需要任何过滤列。 Is there another way to approach this keeping my data.filter() code?有没有另一种方法来解决这个问题,保留我的 data.filter() 代码?

Example toy dataset.示例玩具数据集。 Real dataset contains over 500 columns where all columns are not cleanly ordered:真实数据集包含超过 500 列,其中所有列的排序都不清晰:

toy_data = ({'id':[1,2,3,4,5,6],
         'group': ["a","a","b","b","c","c"],
         'a_name': [1,6,7,3,7,3],
         'b_name': [4,9,2,4,0,2],
         'c_not': [5,7,8,4,2,5],
         'q_name': [4,6,8,2,1,4]
    })
df = pd.DataFrame(toy_data, columns=['id','group','a_name','b_name','c_not','q_name'])

Edit: Missed this in original post.编辑:在原始帖子中错过了这个。 My objective is to get a variable;sum" of the summation of all the selected columns as shown below:我的目标是获得一个变量;所有选定列的总和的“总和”,如下所示:

在此处输入图像描述

You can filter first and then pass df['group'] instead group to groupby , last add sum column by DataFrame.assign :您可以先过滤,然后将df['group']而不是group传递给groupby ,最后添加sum列 by DataFrame.assign

df1 = (df.filter(regex=r'_name$')
         .groupby(df['group']).sum()
         .assign(sum = lambda x: x.sum(axis=1)))

ALternative is filter columns names and pass after groupby : ALternative 是过滤列名称并在groupby之后传递:

cols = df.filter(regex=r'_name$').columns

df1 = df.groupby('group')[cols].sum()

Or:或者:

cols = df.columns[df.columns.str.contains(r'_name$')]

df1 = df.groupby('group')[cols].sum().assign(sum = lambda x: x.sum(axis=1))

print (df1)
       a_name  b_name  q_name  sum
group                             
a           7      13      10   30
b          10       6      10   26
c          10       2       5   17

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM