如何在PANDAS中对具有不同索引的数据帧或系列进行计算？

Question

I have two Series that are of the same length and datatype. 我有两个具有相同长度和数据类型的系列。 Both are float64. 两者都是float64。 The only difference are the indexes both are dates but one date is at the beginnning of the month and the other is at the end of the month. 唯一的区别是索引都是日期，但是一个日期是在月初，另一个是在月末。 How can I do computations like correlation or covariance on Series or dataframes that have different indexes? 如何在具有不同索引的系列或数据帧上进行相关或协方差等计算？

import numpy as np
from pandas import Series, DataFrame
import pandas as pd
import Quandl

IPO=Quandl.get("RITTER/US_IPO_STATS", authtoken="api key")
ir=Quandl.get("FRBC/REALRT", authtoken="api key")

ipo_splice=IPO[264:662]
new_ipo=ipo_splice['Gross Number of IPOs'];
new_ipo=new_ipo.T


ir_splice=ir[0:398]
new_ir=ir_splice['RR 1 Month']
new_ir=new_ir.T

new_ipo.corr(new_ir)

Answer 1

reset_index(drop=True) for the things you want to correlate then concat. reset_index(drop=True)表示你要关联的东西然后是concat。

s1 = pd.DataFrame(np.random.rand(10), list('abcdefghij'), columns=['s1'])
s2 = pd.DataFrame(np.random.rand(10), list('ABCDEFGHIJ'), columns=['s2'])

print pd.concat([s.reset_index(drop=True) for s in [s1, s2]], axis=1).corr()


          s1        s2
s1  1.000000 -0.437945
s2 -0.437945  1.000000

Answer 2

you can use resample() function in order to resample one of your indices (our goal is have either both indices BoM or EoM): 你可以使用resample（）函数重新采样你的一个索引（我们的目标是有两个索引BoM或EoM）：

data: 数据：

In [63]: df_bom
Out[63]:
            val
2015-01-01   76
2015-02-01   27
2015-03-01   65
2015-04-01   71
2015-05-01    9
2015-06-01   23
2015-07-01   52
2015-08-01   10
2015-09-01   62
2015-10-01   25

In [64]: df_eom
Out[64]:
            val
2015-01-31   87
2015-02-28   16
2015-03-31   85
2015-04-30    4
2015-05-31   37
2015-06-30   63
2015-07-31    3
2015-08-31   73
2015-09-30   81
2015-10-31   69

Solution: 解：

In [61]: df_eom.resample('MS') + df_bom
C:\envs\py35\Scripts\ipython:1: FutureWarning: .resample() is now a deferred operation
use .resample(...).mean() instead of .resample(...)
Out[61]:
            val
2015-01-01  163
2015-02-01   43
2015-03-01  150
2015-04-01   75
2015-05-01   46
2015-06-01   86
2015-07-01   55
2015-08-01   83
2015-09-01  143
2015-10-01   94

In [62]: df_eom.resample('MS').join(df_bom, lsuffix='_lft')
C:\envs\py35\Scripts\ipython:1: FutureWarning: .resample() is now a deferred operation
use .resample(...).mean() instead of .resample(...)
Out[62]:
            val_lft  val
2015-01-01       87   76
2015-02-01       16   27
2015-03-01       85   65
2015-04-01        4   71
2015-05-01       37    9
2015-06-01       63   23
2015-07-01        3   52
2015-08-01       73   10
2015-09-01       81   62
2015-10-01       69   25

alternative approach - merging DF's by year and month parts: 替代方法 - 按year和month合并DF：

In [69]: %paste
(pd.merge(df_bom, df_eom,
          left_on=[df_bom.index.year, df_bom.index.month],
          right_on=[df_eom.index.year, df_eom.index.month],
          suffixes=('_bom','_eom')))
## -- End pasted text --
Out[69]:
   key_0  key_1  val_bom  val_eom
0   2015      1       76       87
1   2015      2       27       16
2   2015      3       65       85
3   2015      4       71        4
4   2015      5        9       37
5   2015      6       23       63
6   2015      7       52        3
7   2015      8       10       73
8   2015      9       62       81
9   2015     10       25       69

Setup: 设定：

In [59]: df_bom = pd.DataFrame({'val':np.random.randint(0,100, 10)}, index=pd.date_range('2015-01-01', periods=10, freq='MS'))

In [60]: df_eom = pd.DataFrame({'val':np.random.randint(0,100, 10)}, index=pd.date_range('2015-01-01', periods=10, freq='M'))

如何在PANDAS中对具有不同索引的数据帧或系列进行计算？

问题描述

2 个解决方案

解决方案1
0 2016-06-17 01:11:33

解决方案2
0 2016-06-17 08:09:55

如何在PANDAS中对具有不同索引的数据帧或系列进行计算？

问题描述

2 个解决方案

解决方案1 0 2016-06-17 01:11:33

解决方案2 0 2016-06-17 08:09:55

解决方案1
0 2016-06-17 01:11:33

解决方案2
0 2016-06-17 08:09:55