I currently have a dataframe that has median sales values for locations broken down by each month of the year going back to the 60s (roughly 260 columns). I'd like to create a column for each year containing the mean average of each year but I'm unsure of how to go about doing that. An example of the data:
RegionID ZIP City State Metro CountyName SizeRank 1996-04 \
0 61639 10025 New York NY New York New York 1 NaN
2 61637 10023 New York NY New York New York 3 NaN
13 61703 10128 New York NY New York New York 14 NaN
14 61625 10011 New York NY New York New York 15 NaN
20 61617 10003 New York NY New York New York 21 NaN
1996-05 1996-06 ... 2016-09 2016-10 2016-11 2016-12 2017-01 \
0 NaN NaN ... 1374400 1364100 1366300 1354800.0 1327500
2 NaN NaN ... 1993500 1980700 1960900 1951300.0 1937800
13 NaN NaN ... 1526000 1523700 1527200 1541600.0 1557800
14 NaN NaN ... 2354000 2355500 2352200 2332100.0 2313300
20 NaN NaN ... 1932800 1930400 1937500 1935100.0 1915700
Most of the datatypes are int64 but I also have these Region Codes / location info.
(df
# stacks each column one on top of the other
.stack()
.reset_index(level=1)
# give the columns sensible names
.rename(columns={'level_1': 'year_month', 0: 'sales'})
# convert to datetime format
.assign(year_month=lambda x: x.year_month.astype('datetime64[ns]'),
# pull out the year part of the data
year_col=lambda x: x['year_month'].dt.year)
# run a groupby for each year and calculate the average sales
.groupby('year_col', as_index=False)['sales'].mean()
)
year_col
2017 854096
Name: 0, dtype: int64
Here is one approach
df.stack().groupby(level=1).mean()
2017-02 824100
2017-03 832480
2017-04 850160
2017-05 872880
2017-06 890860
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.