简体   繁体   English

Python Pandas Group Dataframe按列/ Sum Integer列按String列

[英]Python Pandas Group Dataframe by Column / Sum Integer Column by String Column

I have been stuck all day and have been through numerous SO articles and am still stuck on my last final piece. 我整日都被困住了,读了很多SO文章,但仍然停留在我最后的最后一篇文章中。 I imported a CSV into a massive dataframe, then eventually got the smaller dataframe below: (Note: My df is indexed on 'Name' right now, which is what I need to base the group or sum off of) 我将CSV导入了一个庞大的数据框,然后最终得到了下面的较小数据框:(注意:我的df现在在“名称”上建立了索引,这是我需要根据组或求和的基础)

  Name          Classification       Value 1     Value 2
Company 1   Classification Code 1      5000       8000      
Company 1   Classification Code 1      6000       2000
Company 2   Classification Code 1      2000       3000    
Company 2   Classification Code 1      1000       4500     
Company 3   Classification Code 2      15000      10000      
Company 3   Classification Code 2      20000      32000     
Company 4   Classification Code 3      7500       10000    
Company 4   Classification Code 3      7000       1500     

What I am struggling with now is how to sum the two values based on the company (I have mainly been using groupby and sum() but have been stuck for hours. I know there are a lot of SO articles talking about summing things in pandas but I have had no luck for hours. ANY help would be greatly appreciated. Thanks so much. 我现在正在努力的是如何基于公司对两个值求和(我主要使用groupby和sum(),但是已经停滞了几个小时。我知道有很多关于在熊猫中对事物求和的SO文章。但我已经好几个小时没有运气了,我们将不胜感激,非常感谢。

Edit: The output I am looking for is the following 编辑:我正在寻找的输出如下

Company 1    Classification Code 1    11,000    10,000
Company 2    Classification Code 1    3,000      7,500
Company 3    Classification Code 2    35,000    42,000
Company 4    Classification Code 3    14,500    11,500

Option 1 选项1
set_index then groupby set_index然后groupby
This assumes that the 'Classification' column is the same across Company 这假定'Classification'列是相同的跨Company

df.set_index('Classification', append=True) \
    .groupby(level=[0, 1]).sum().reset_index(1)

                  Classification  Value 1  Value 2
Name                                              
Company 1  Classification Code 1    11000    10000
Company 2  Classification Code 1     3000     7500
Company 3  Classification Code 2    35000    42000
Company 4  Classification Code 3    14500    11500

Option 2 选项2
groupby then agg groupby然后agg
This doesn't make any assumptions about uniqueness of 'Classification' across 'Company' but will just grab the first 'Classification' per 'Company' 这不会对'Company''Classification'唯一性做出任何假设,而只会获取每个'Company'的第一个'Classification' 'Company'

df.groupby(level=0).agg(
    {'Classification': 'first', 'Value 1': 'sum', 'Value 2': 'sum'})

                  Classification  Value 1  Value 2
Name                                              
Company 1  Classification Code 1    11000    10000
Company 2  Classification Code 1     3000     7500
Company 3  Classification Code 2    35000    42000
Company 4  Classification Code 3    14500    11500

Response to Comments 对评论的回应
In regards to concatenation 关于串联
Check dtypes with df.dtypes . 检查dtypesdf.dtypes If you see object instead of int then yes, you need to convert to numeric . 如果看到object而不是int则是,您需要转换为numeric

You can do this simply with 您可以简单地通过

df.apply(pd.to_numeric, errors='ignore').groupby(level=0).agg(
    {'Classification': 'first', 'Value 1': 'sum', 'Value 2': 'sum'})

Or more manually 或更手动

df['Value 1'] = df['Value 1'].astype(int)
df['Value 2'] = df['Value 2'].astype(int)

Then proceed to prior suggestions. 然后继续进行先前的建议。

In regards to placement of columns 关于列的放置
You can always reorder your columns 您可以随时对列进行重新排序

d1 = df.apply(pd.to_numeric, errors='ignore').groupby(level=0).agg(
    {'Classification': 'first', 'Value 1': 'sum', 'Value 2': 'sum'})

d1[df.columns]

Or 要么

d1 = df.apply(pd.to_numeric, errors='ignore').groupby(level=0).agg(
    {'Classification': 'first', 'Value 1': 'sum', 'Value 2': 'sum'})

d1.reindex_axis(df.columns, 1)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM