简体   繁体   中英

Grouping pandas dataframe by two columns without summarizing it

I have a pandas Dataframe over the different states in America. I would like to group by the two columns year and state in order to statistically test some things eg cause of death, newborns etc. and also plot it. I can only come up with the groupby pandas function where I have to specify a statistical summary in the end such as:

import pandas as pd
df = pd.read_csv(path + 'csvfile.csv')
grouped_df = df.groupby(['Year', 'State']).mean()

However, I would like to just group by the year and state alone, but doing so with groupby I get this:

import pandas as pd
df = pd.read_csv(path + 'csvfile.csv')
grouped_df = df.groupby(['Year', 'State'])

<pandas.core.groupby.generic.DataFrameGroupBy object at 0x0000025720134688>

How can I do this?

First groupby is simplifying like iterator , so is important what is after specify - aggregate function, custom function..?


Not sure what means group by the year and state alone , if need MultiIndex by 2 columns use:

grouped_df = df.set_index(['Year', 'State'])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM