[英]How can I do computations on dataframes or series that have different indexes in PANDAS?
I have two Series that are of the same length and datatype. 我有两个具有相同长度和数据类型的系列。 Both are float64.
两者都是float64。 The only difference are the indexes both are dates but one date is at the beginnning of the month and the other is at the end of the month.
唯一的区别是索引都是日期,但是一个日期是在月初,另一个是在月末。 How can I do computations like correlation or covariance on Series or dataframes that have different indexes?
如何在具有不同索引的系列或数据帧上进行相关或协方差等计算?
import numpy as np
from pandas import Series, DataFrame
import pandas as pd
import Quandl
IPO=Quandl.get("RITTER/US_IPO_STATS", authtoken="api key")
ir=Quandl.get("FRBC/REALRT", authtoken="api key")
ipo_splice=IPO[264:662]
new_ipo=ipo_splice['Gross Number of IPOs'];
new_ipo=new_ipo.T
ir_splice=ir[0:398]
new_ir=ir_splice['RR 1 Month']
new_ir=new_ir.T
new_ipo.corr(new_ir)
reset_index(drop=True)
for the things you want to correlate then concat. reset_index(drop=True)
表示你要关联的东西然后是concat。
s1 = pd.DataFrame(np.random.rand(10), list('abcdefghij'), columns=['s1'])
s2 = pd.DataFrame(np.random.rand(10), list('ABCDEFGHIJ'), columns=['s2'])
print pd.concat([s.reset_index(drop=True) for s in [s1, s2]], axis=1).corr()
s1 s2
s1 1.000000 -0.437945
s2 -0.437945 1.000000
you can use resample() function in order to resample one of your indices (our goal is have either both indices BoM or EoM): 你可以使用resample()函数重新采样你的一个索引(我们的目标是有两个索引BoM或EoM):
data: 数据:
In [63]: df_bom
Out[63]:
val
2015-01-01 76
2015-02-01 27
2015-03-01 65
2015-04-01 71
2015-05-01 9
2015-06-01 23
2015-07-01 52
2015-08-01 10
2015-09-01 62
2015-10-01 25
In [64]: df_eom
Out[64]:
val
2015-01-31 87
2015-02-28 16
2015-03-31 85
2015-04-30 4
2015-05-31 37
2015-06-30 63
2015-07-31 3
2015-08-31 73
2015-09-30 81
2015-10-31 69
Solution: 解:
In [61]: df_eom.resample('MS') + df_bom
C:\envs\py35\Scripts\ipython:1: FutureWarning: .resample() is now a deferred operation
use .resample(...).mean() instead of .resample(...)
Out[61]:
val
2015-01-01 163
2015-02-01 43
2015-03-01 150
2015-04-01 75
2015-05-01 46
2015-06-01 86
2015-07-01 55
2015-08-01 83
2015-09-01 143
2015-10-01 94
In [62]: df_eom.resample('MS').join(df_bom, lsuffix='_lft')
C:\envs\py35\Scripts\ipython:1: FutureWarning: .resample() is now a deferred operation
use .resample(...).mean() instead of .resample(...)
Out[62]:
val_lft val
2015-01-01 87 76
2015-02-01 16 27
2015-03-01 85 65
2015-04-01 4 71
2015-05-01 37 9
2015-06-01 63 23
2015-07-01 3 52
2015-08-01 73 10
2015-09-01 81 62
2015-10-01 69 25
alternative approach - merging DF's by year
and month
parts: 替代方法 - 按
year
和month
合并DF:
In [69]: %paste
(pd.merge(df_bom, df_eom,
left_on=[df_bom.index.year, df_bom.index.month],
right_on=[df_eom.index.year, df_eom.index.month],
suffixes=('_bom','_eom')))
## -- End pasted text --
Out[69]:
key_0 key_1 val_bom val_eom
0 2015 1 76 87
1 2015 2 27 16
2 2015 3 65 85
3 2015 4 71 4
4 2015 5 9 37
5 2015 6 23 63
6 2015 7 52 3
7 2015 8 10 73
8 2015 9 62 81
9 2015 10 25 69
Setup: 设定:
In [59]: df_bom = pd.DataFrame({'val':np.random.randint(0,100, 10)}, index=pd.date_range('2015-01-01', periods=10, freq='MS'))
In [60]: df_eom = pd.DataFrame({'val':np.random.randint(0,100, 10)}, index=pd.date_range('2015-01-01', periods=10, freq='M'))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.