I have a pandas Dataframe over the different states in America. I would like to group by the two columns year and state in order to statistically test some things eg cause of death, newborns etc. and also plot it. I can only come up with the groupby
pandas function where I have to specify a statistical summary in the end such as:
import pandas as pd
df = pd.read_csv(path + 'csvfile.csv')
grouped_df = df.groupby(['Year', 'State']).mean()
However, I would like to just group by the year and state alone, but doing so with groupby
I get this:
import pandas as pd
df = pd.read_csv(path + 'csvfile.csv')
grouped_df = df.groupby(['Year', 'State'])
<pandas.core.groupby.generic.DataFrameGroupBy object at 0x0000025720134688>
How can I do this?
First groupby
is simplifying like iterator
, so is important what is after specify - aggregate function, custom function..?
Not sure what means group by the year and state alone
, if need MultiIndex
by 2 columns use:
grouped_df = df.set_index(['Year', 'State'])
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.