I want to calculate rolling correlation of grouped data. How can I do it in Pandas? I have created dummy data and done it with PySpark below using SQL
import pandas as pd
import numpy as np
from datetime import datetime, timedelta
my_array = np.random.random(90).reshape(-1, 3)
groups = np.array(['a', 'b', 'c']).reshape(-1,1)
groups = np.repeat(groups, 10).reshape(-1, 1)
my_array = np.append(my_array, groups, axis = 1)
df = pd.DataFrame(my_array, columns = list('abcd'))
df['date'] = pd.to_datetime([datetime.today() + timedelta(i) for i in range(30)])
spark.createDataFrame(df).createOrReplaceTempView('df_tbl')
spark.sql("""
select *,
corr(a,b) over (partition by d order by date rows between 8 preceding and current row) as cor1,
corr(a,b) over (partition by d order by date rows between 8 preceding and current row) as cor2
from df_tbl
""").toPandas().head(10)
Use date
as index and apply rolling groupby functionality to calculate corr
on a
and b
. Later reset_index
to makes indices into columns as it will be hard to access timestamp as index
. Like this
df.set_index('date', inplace=True)
result = df.groupby(['d'])[['a','b']].rolling(8).corr()
result.reset_index(inplace=True)
Output would look like this:
d date level_2 a b
0 a 2020-03-03 21:21:29.512854 a NaN NaN
1 a 2020-03-03 21:21:29.512854 b NaN NaN
2 a 2020-03-04 21:21:29.512866 a NaN NaN
3 a 2020-03-04 21:21:29.512866 b NaN NaN
4 a 2020-03-05 21:21:29.512869 a NaN NaN
5 a 2020-03-05 21:21:29.512869 b NaN NaN
6 a 2020-03-06 21:21:29.512871 a NaN NaN
7 a 2020-03-06 21:21:29.512871 b NaN NaN
8 a 2020-03-07 21:21:29.512872 a NaN NaN
9 a 2020-03-07 21:21:29.512872 b NaN NaN
10 a 2020-03-08 21:21:29.512874 a NaN NaN
11 a 2020-03-08 21:21:29.512874 b NaN NaN
12 a 2020-03-09 21:21:29.512876 a NaN NaN
13 a 2020-03-09 21:21:29.512876 b NaN NaN
14 a 2020-03-10 21:21:29.512878 a 1.000000 -0.166854
15 a 2020-03-10 21:21:29.512878 b -0.166854 1.000000
16 a 2020-03-11 21:21:29.512880 a 1.000000 -0.095549
17 a 2020-03-11 21:21:29.512880 b -0.095549 1.000000
...
...
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.