简体   繁体   English

如何仅汇总数据框的某些列(python pandas)

[英]How to summarize only certain columns of dataframe (python pandas)

I want to get new dataframe, in which I need to see sum of certain columns for rows which have same value of 'Index' columns ( campaign_id and group_name in my example) This is sample (example) of my dataframe:我想获得新的数据框,其中我需要查看具有相同“索引”列值的行的某些列的总和(在我的示例中为campaign_idgroup_name )这是我的数据框的示例(示例):

campaign_id  group_name  clicks    conversions   cost    label    city_id 
101          blue        40        15            100     foo      15
102          red         20        5             50      bar      12
102          red         7         3             25      bar      12
102          brown       5         0             18      bar      12

this is what I want to get:这就是我想要得到的:

campaign_id  group_name  clicks    conversions   cost    label    city_id 
101          blue        40        15            100     foo      15
102          red         27        8             75      bar      12
102          brown       5         0             18      bar      12

I tried:我试过:

df = df.groupby(['campaign_id','group_name'])['clicks','conversions','cost'].sum().reset_index()

but this gives my only mentioned (summarized) columns (and Index), like this:但这给出了我唯一提到的(汇总的)列(和索引),如下所示:

campaign_id  group_name  clicks    conversions   cost    
101          blue        40        15            100
102          red         27        8             75
102          brown       5         0             18

I can try to add leftover columns after this operation, but I'm not sure if this will be optimal and adequate way to solve the problem我可以尝试在此操作后添加剩余的列,但我不确定这是否是解决问题的最佳和适当方法

Is there simple way to summarize certain columns and leave other columns untouched (I don't care if they would differ, because in my data all leftover columns have same data for rows with same corresponding values in 'Index' columns (which are campaign_id and group_name )是否有简单的方法来汇总某些列并保持其他列不变(我不在乎它们是否会有所不同,因为在我的数据中,所有剩余的列对于在“索引”列中具有相同对应值的行(即活动 ID组名)

When I finished my post I saw the answer right away: since all columns except those which I want to summarize - have matching values - I just need to take all those columns as part of multi-index, for this operation.当我完成我的帖子时,我立即看到了答案:因为除了我要汇总的那些列之外的所有列都具有匹配的值,对于此操作,我只需要将所有这些列作为多索引的一部分。 Like this:像这样:

df = df.groupby(['campaign_id','group_name','lavel','city_id'])['clicks','conversions','cost'].sum().reset_index()

In this case I got exacty what I wanted.在这种情况下,我得到了我想要的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM