熊猫groupby函数的总和不超过8列

Question

I have a pandas dataframe that contains 13 text columns and 16 numeric columns (29 columns in total, around 13k rows). 我有一个pandas数据框，其中包含13个文本列和16个数字列（总共29列，大约13000行）。 I would like to aggregate the data by the first 13 columns and return the sum of the results for the 16 numeric columns. 我想按前13列汇总数据，并返回16个数字列的结果总和。 I have tried the following: 我尝试了以下方法：

df.groupby(1,2,3,4,5,6,7,8,9,10,11,12,13)[14,15,26,17,18,19,20,21,22,23,24,25,26,27,28,29].sum()

but this returns an error "groupby() takes from 1 to 8 positional arguments but 14 were given" 但这会返回错误“ groupby（）接受1到8个位置参数，但给出了14个”

I am essentially trying to do the following as it would be in SQL syntax: 我实质上是想做以下事情，就像在SQL语法中那样：

select 1,2,3,4,5,6,7,8,9,10,11,12,13,sum(14),sum(15),sum(16),sum(17),sum(18),sum(19),sum(20),sum(21),sum(22),sum(23),sum(24),sum(25),sum(26),sum(27),sum(28),sum(29)
from df group by 1,2,3,4,5,6,7,8,9,10,11,12,13

I'd also like the process done in place so I end up with a dataframe the same shape as the old one (with fewer rows, obviously!) 我还希望该过程已经完成，因此最终得到的数据框的形状与旧的形状相同（显然，行数更少！）

any help appreciated, thanks! 任何帮助表示赞赏，谢谢！

Answer 1

A slightly more general approach that uses .select_dtypes ( docs ) to isolate numeric columns: 使用.select_dtypes （ docs ）隔离数字列的更通用的方法：

import pandas as pd
import numpy as np

numerical_columns = df.select_dtypes(include=[np.number]).columns.tolist()
other_columns = df.select_dtypes(exclude=[np.number]).columns.tolist()

df.groupby(other_columns)[numerical_columns].sum()

As for why your code is not working, it is because you need to pass a list of column names to groupby. 至于为什么代码无法正常工作，这是因为您需要将列名称列表传递给groupby。

熊猫groupby函数的总和不超过8列

问题描述

1 个解决方案

解决方案1
1 已采纳 2017-06-08 17:11:17

熊猫groupby函数的总和不超过8列

问题描述

1 个解决方案

解决方案1 1 已采纳 2017-06-08 17:11:17

解决方案1
1 已采纳 2017-06-08 17:11:17