大熊猫中大量列的分组

Question

I am trying to loop through multiple excel files in pandas.我正在尝试遍历熊猫中的多个 excel 文件。 The structure of the files are very much similar, the first 10 column forms a key and rest of the columns have the values.文件的结构非常相似，前 10 列形成一个键，其余列具有值。 I want to group by first 10 columns and sum the rest.我想按前 10 列分组并对其余列求和。

I have searched and found solutions online for similar cases but my problem is that我已经在网上搜索并找到了类似案例的解决方案，但我的问题是

I have large number of columns with values ( to be aggregate as sum) and我有大量带有值的列（要聚合为总和）和
Number / names of columns(with values) is different for each file(dataframe)每个文件（数据框）的列数/名称（带值）是不同的
#Key columns are same across all the files. #Key 列在所有文件中都相同。

I can't share the actual data sample but here is the format sample of the file structure我不能分享实际的数据样本，但这里是文件结构的格式样本

and here is the desired output from the above data这是上述数据所需的输出

It is like a groupby operation but with uncertain large number of columns and header name makes it difficult to use groupby or pivot.它类似于 groupby 操作，但不确定的大量列和标题名称使得使用 groupby 或 pivot 变得困难。 Can Any one suggest me what is the best possible solution for it in python.任何人都可以建议我在python中最好的解决方案是什么。

Edited:编辑：

df.groupby(list(df.columns[:11])).agg(sum)

is working but for some reason it is taking 25-30 mins.正在工作，但由于某种原因需要 25-30 分钟。 the same thing MS Access is done in 1-2 mins .同样的事情 MS Access 在 1-2 分钟内完成。 Am I doing something wrong here or is there any other way to do it in python itself我在这里做错了什么，还是有其他方法可以在 python 中做到这一点

Answer 1

Just use df.columns which has the list of columns, you can then use a slice on that list to get the 10 leftmost columns.只需使用具有列列表的df.columns ，然后您就可以使用该列表上的切片来获取最左边的 10 列。

This should work:这应该有效：

df.groupby(df.columns[:10].to_list()).sum()

大熊猫中大量列的分组

问题描述

1 个解决方案

解决方案1
0 2020-02-08 05:05:12

大熊猫中大量列的分组

问题描述

1 个解决方案

解决方案1 0 2020-02-08 05:05:12

解决方案1
0 2020-02-08 05:05:12