[英]How can I do computations on dataframes or series that have different indexes in PANDAS?

I have two Series that are of the same length and datatype. 我有两个具有相同长度和数据类型的系列。 Both are float64. 两者都是float64。 The only difference are the indexes both are dates but one date is at the beginnning of the month and the other is at the end of the month. 唯一的区别是索引都是日期,但是一个日期是在月初,另一个是在月末。 How can I do computations like correlation or covariance on Series or dataframes that have different indexes? 如何在具有不同索引的系列或数据帧上进行相关或协方差等计算?

import numpy as np
from pandas import Series, DataFrame
import pandas as pd
import Quandl

IPO=Quandl.get("RITTER/US_IPO_STATS", authtoken="api key")
ir=Quandl.get("FRBC/REALRT", authtoken="api key")

new_ipo=ipo_splice['Gross Number of IPOs'];

new_ir=ir_splice['RR 1 Month']


reset_index(drop=True) for the things you want to correlate then concat. reset_index(drop=True)表示你要关联的东西然后是concat。

s1 = pd.DataFrame(np.random.rand(10), list('abcdefghij'), columns=['s1'])
s2 = pd.DataFrame(np.random.rand(10), list('ABCDEFGHIJ'), columns=['s2'])

print pd.concat([s.reset_index(drop=True) for s in [s1, s2]], axis=1).corr()

          s1        s2
s1  1.000000 -0.437945
s2 -0.437945  1.000000

you can use resample() function in order to resample one of your indices (our goal is have either both indices BoM or EoM): 你可以使用resample()函数重新采样你的一个索引(我们的目标是有两个索引BoM或EoM):

data: 数据:

In [63]: df_bom
2015-01-01   76
2015-02-01   27
2015-03-01   65
2015-04-01   71
2015-05-01    9
2015-06-01   23
2015-07-01   52
2015-08-01   10
2015-09-01   62
2015-10-01   25

In [64]: df_eom
2015-01-31   87
2015-02-28   16
2015-03-31   85
2015-04-30    4
2015-05-31   37
2015-06-30   63
2015-07-31    3
2015-08-31   73
2015-09-30   81
2015-10-31   69

Solution: 解:

In [61]: df_eom.resample('MS') + df_bom
C:\envs\py35\Scripts\ipython:1: FutureWarning: .resample() is now a deferred operation
use .resample(...).mean() instead of .resample(...)
2015-01-01  163
2015-02-01   43
2015-03-01  150
2015-04-01   75
2015-05-01   46
2015-06-01   86
2015-07-01   55
2015-08-01   83
2015-09-01  143
2015-10-01   94

In [62]: df_eom.resample('MS').join(df_bom, lsuffix='_lft')
C:\envs\py35\Scripts\ipython:1: FutureWarning: .resample() is now a deferred operation
use .resample(...).mean() instead of .resample(...)
            val_lft  val
2015-01-01       87   76
2015-02-01       16   27
2015-03-01       85   65
2015-04-01        4   71
2015-05-01       37    9
2015-06-01       63   23
2015-07-01        3   52
2015-08-01       73   10
2015-09-01       81   62
2015-10-01       69   25

alternative approach - merging DF's by year and month parts: 替代方法 - 按yearmonth合并DF:

In [69]: %paste
(pd.merge(df_bom, df_eom,
          left_on=[df_bom.index.year, df_bom.index.month],
          right_on=[df_eom.index.year, df_eom.index.month],
## -- End pasted text --
   key_0  key_1  val_bom  val_eom
0   2015      1       76       87
1   2015      2       27       16
2   2015      3       65       85
3   2015      4       71        4
4   2015      5        9       37
5   2015      6       23       63
6   2015      7       52        3
7   2015      8       10       73
8   2015      9       62       81
9   2015     10       25       69

Setup: 设定:

In [59]: df_bom = pd.DataFrame({'val':np.random.randint(0,100, 10)}, index=pd.date_range('2015-01-01', periods=10, freq='MS'))

In [60]: df_eom = pd.DataFrame({'val':np.random.randint(0,100, 10)}, index=pd.date_range('2015-01-01', periods=10, freq='M'))


