I have two dataframes, which are identical in terms of size (rows/date index and columns/firms). What I want to do now is to calculate timeseries statistics for the observations in Dataframe1 based on the logic contained in Dataframe2. For example, I want to calulcate the timeseries average observation (Dataframe1) based on a rank (Dataframe2)
So some sort of a groupby-procedure except the fact that I use a second dataframe for the condition.
Glad for any input as I was not able to find a similar problem!
Dataframe1
----------------------------------
A B C D E F G H
31.12.2009 30 66 NaN NaN NaN NaN 393 57
01.01.2010 30 66 NaN NaN NaN NaN 393 57
04.01.2010 31 66 NaN NaN NaN NaN 404 57
05.01.2010 33 66 NaN NaN NaN NaN 400 58
06.01.2010 33 66 NaN NaN NaN NaN 400 58
Dataframe2
----------------------------------
A B C D E F G H
31.12.2009 1.0 2.0 NaN NaN NaN NaN 2.0 1.0
01.01.2010 1.0 2.0 NaN NaN NaN NaN 2.0 1.0
04.01.2010 1.0 1.0 NaN NaN NaN NaN 2.0 2.0
05.01.2010 1.0 2.0 NaN NaN NaN NaN 1.0 2.0
06.01.2010 2.0 2.0 NaN NaN NaN NaN 1.0 1.0
Desired output
----------------------------------
1.0 2.0
31.12.2009 43.5 229.5
01.01.2010 43.5 229.5
04.01.2010 48.5 230.5
05.01.2010 216.5 62.0
06.01.2010 229.0 49.5
You can use a dictionary comprehension to create the result dataframe. Each column is generated using where
to replace values in df1
by nan when the specific value is not met in df2
, to be able to use mean
over axis=1 for each unique
value of df2
df_res = pd.DataFrame({col: df1.where(df2.eq(col)).mean(1) for col in df2.stack().unique()})
print (df_res)
1.0 2.0
31.12.2009 43.5 229.5
01.01.2010 43.5 229.5
04.01.2010 48.5 230.5
05.01.2010 216.5 62.0
06.01.2010 229.0 49.5
Doing each value one at a time:
(1)
df1.where(df2 == 1).mean(axis=1)
Output:
31.12.2009 43.5
01.01.2010 43.5
04.01.2010 48.5
05.01.2010 216.5
06.01.2010 229.0
(2)
df1.where(df2 == 2).mean(axis=1)
Output:
31.12.2009 229.5
01.01.2010 229.5
04.01.2010 230.5
05.01.2010 62.0
06.01.2010 49.5
Combining into a your desired output:
output = pd.DataFrame({'1':df1.where(df2 == 1).mean(axis=1),
'2':df1.where(df2 == 2).mean(axis=1)})
1 2
31.12.2009 43.5 229.5
01.01.2010 43.5 229.5
04.01.2010 48.5 230.5
05.01.2010 216.5 62.0
06.01.2010 229.0 49.5
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.