I have been stuck all day and have been through numerous SO articles and am still stuck on my last final piece. I imported a CSV into a massive dataframe, then eventually got the smaller dataframe below: (Note: My df is indexed on 'Name' right now, which is what I need to base the group or sum off of)
Name Classification Value 1 Value 2
Company 1 Classification Code 1 5000 8000
Company 1 Classification Code 1 6000 2000
Company 2 Classification Code 1 2000 3000
Company 2 Classification Code 1 1000 4500
Company 3 Classification Code 2 15000 10000
Company 3 Classification Code 2 20000 32000
Company 4 Classification Code 3 7500 10000
Company 4 Classification Code 3 7000 1500
What I am struggling with now is how to sum the two values based on the company (I have mainly been using groupby and sum() but have been stuck for hours. I know there are a lot of SO articles talking about summing things in pandas but I have had no luck for hours. ANY help would be greatly appreciated. Thanks so much.
Edit: The output I am looking for is the following
Company 1 Classification Code 1 11,000 10,000
Company 2 Classification Code 1 3,000 7,500
Company 3 Classification Code 2 35,000 42,000
Company 4 Classification Code 3 14,500 11,500
Option 1
set_index
then groupby
This assumes that the 'Classification'
column is the same across Company
df.set_index('Classification', append=True) \
.groupby(level=[0, 1]).sum().reset_index(1)
Classification Value 1 Value 2
Name
Company 1 Classification Code 1 11000 10000
Company 2 Classification Code 1 3000 7500
Company 3 Classification Code 2 35000 42000
Company 4 Classification Code 3 14500 11500
Option 2
groupby
then agg
This doesn't make any assumptions about uniqueness of 'Classification'
across 'Company'
but will just grab the first 'Classification'
per 'Company'
df.groupby(level=0).agg(
{'Classification': 'first', 'Value 1': 'sum', 'Value 2': 'sum'})
Classification Value 1 Value 2
Name
Company 1 Classification Code 1 11000 10000
Company 2 Classification Code 1 3000 7500
Company 3 Classification Code 2 35000 42000
Company 4 Classification Code 3 14500 11500
Response to Comments
In regards to concatenation
Check dtypes
with df.dtypes
. If you see object
instead of int
then yes, you need to convert to numeric
.
You can do this simply with
df.apply(pd.to_numeric, errors='ignore').groupby(level=0).agg(
{'Classification': 'first', 'Value 1': 'sum', 'Value 2': 'sum'})
Or more manually
df['Value 1'] = df['Value 1'].astype(int)
df['Value 2'] = df['Value 2'].astype(int)
Then proceed to prior suggestions.
In regards to placement of columns
You can always reorder your columns
d1 = df.apply(pd.to_numeric, errors='ignore').groupby(level=0).agg(
{'Classification': 'first', 'Value 1': 'sum', 'Value 2': 'sum'})
d1[df.columns]
Or
d1 = df.apply(pd.to_numeric, errors='ignore').groupby(level=0).agg(
{'Classification': 'first', 'Value 1': 'sum', 'Value 2': 'sum'})
d1.reindex_axis(df.columns, 1)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.