Working with housing value data from Zillow, my dataframe has a series of column names in the form yyyy-mm, example
2001-01|2000-02|2000-03|2000-04|...|2016-08
Each of these columns contains the average home value for that period for a number of city, state combinations
RegionName State 2000-01 2000-02 2000-03
Philadelphia PA 53100 53200 53400
I need to group the columns but their respective quarter, provide a mean value and return this to a new dataframe. So for the example provided I would have something like the following
RegionName State 2000q1 2000q2 2000q3 2000q4
Philadelphia PA 53233.33 ... ... ...
I have no idea how to begin to tackle this. I am able to pull off the column names, get the last two characters, and map those to a dictionary containing the quarter number, but that is it. Not sure how to reassemble the data
Any help is greatly appreciated
This is similar to MaxU's answer , but shows a way to groupby both the date columns and the non-date columns ( RegionName
, State
).
import numpy as np
import pandas as pd
df = pd.DataFrame({'2000-01': [53100], '2000-02': [53200], '2000-03': [53400], 'RegionName': ['Philadelphia'], 'State': ['PA']})
melted = pd.melt(df, id_vars=['RegionName', 'State'], var_name='date')
melted['date'] = pd.PeriodIndex(melted['date'], freq='Q')
result = melted.groupby(['RegionName', 'State', 'date']).mean()
result = result['value'].unstack('date')
yields
date 2000Q1
RegionName State
Philadelphia PA 53233
First, use pd.melt
to coalesce all the date columns into a single column:
import numpy as np
import pandas as pd
df = pd.DataFrame({'2000-01': [53100], '2000-02': [53200], '2000-03': [53400], 'RegionName': ['Philadelphia'], 'State': ['PA']})
melted = pd.melt(df, id_vars=['RegionName', 'State'], var_name='date')
# RegionName State date value
# 0 Philadelphia PA 2000-01 53100
# 1 Philadelphia PA 2000-02 53200
# 2 Philadelphia PA 2000-03 53400
Next, use pd.PeriodIndex
to convert the date (strings?) to pd.Period
s. Notice that pd.PeriodIndex
can convert different date strings to the same Period:
melted['date'] = pd.PeriodIndex(melted['date'], freq='Q')
# RegionName State date value
# 0 Philadelphia PA 2000Q1 53100
# 1 Philadelphia PA 2000Q1 53200
# 2 Philadelphia PA 2000Q1 53400
Finally, use groupby/mean
to group together rows with the same RegionName
, State
and date
and compute the mean for each group:
result = melted.groupby(['RegionName', 'State', 'date']).mean()
# value
# RegionName State date
# Philadelphia PA 2000Q1 53233
If you stop here, your DataFrame will be tidy (PDF) -- each row represents one "observation". This is often the best form for further computation. (It isn't by accident that first step in this solution was to convert the original DataFrame to tidy format -- notice that melted
also has the property that each row represents one observation.)
However, if you'd like, you could move the date
index level into separate columns:
result = result['value'].unstack('date')
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.