简体   繁体   English

如何在PANDAS中对具有不同索引的数据帧或系列进行计算?

[英]How can I do computations on dataframes or series that have different indexes in PANDAS?

I have two Series that are of the same length and datatype. 我有两个具有相同长度和数据类型的系列。 Both are float64. 两者都是float64。 The only difference are the indexes both are dates but one date is at the beginnning of the month and the other is at the end of the month. 唯一的区别是索引都是日期,但是一个日期是在月初,另一个是在月末。 How can I do computations like correlation or covariance on Series or dataframes that have different indexes? 如何在具有不同索引的系列或数据帧上进行相关或协方差等计算?

import numpy as np
from pandas import Series, DataFrame
import pandas as pd
import Quandl

IPO=Quandl.get("RITTER/US_IPO_STATS", authtoken="api key")
ir=Quandl.get("FRBC/REALRT", authtoken="api key")

ipo_splice=IPO[264:662]
new_ipo=ipo_splice['Gross Number of IPOs'];
new_ipo=new_ipo.T


ir_splice=ir[0:398]
new_ir=ir_splice['RR 1 Month']
new_ir=new_ir.T

new_ipo.corr(new_ir)

reset_index(drop=True) for the things you want to correlate then concat. reset_index(drop=True)表示你要关联的东西然后是concat。

s1 = pd.DataFrame(np.random.rand(10), list('abcdefghij'), columns=['s1'])
s2 = pd.DataFrame(np.random.rand(10), list('ABCDEFGHIJ'), columns=['s2'])

print pd.concat([s.reset_index(drop=True) for s in [s1, s2]], axis=1).corr()


          s1        s2
s1  1.000000 -0.437945
s2 -0.437945  1.000000

you can use resample() function in order to resample one of your indices (our goal is have either both indices BoM or EoM): 你可以使用resample()函数重新采样你的一个索引(我们的目标是有两个索引BoM或EoM):

data: 数据:

In [63]: df_bom
Out[63]:
            val
2015-01-01   76
2015-02-01   27
2015-03-01   65
2015-04-01   71
2015-05-01    9
2015-06-01   23
2015-07-01   52
2015-08-01   10
2015-09-01   62
2015-10-01   25

In [64]: df_eom
Out[64]:
            val
2015-01-31   87
2015-02-28   16
2015-03-31   85
2015-04-30    4
2015-05-31   37
2015-06-30   63
2015-07-31    3
2015-08-31   73
2015-09-30   81
2015-10-31   69

Solution: 解:

In [61]: df_eom.resample('MS') + df_bom
C:\envs\py35\Scripts\ipython:1: FutureWarning: .resample() is now a deferred operation
use .resample(...).mean() instead of .resample(...)
Out[61]:
            val
2015-01-01  163
2015-02-01   43
2015-03-01  150
2015-04-01   75
2015-05-01   46
2015-06-01   86
2015-07-01   55
2015-08-01   83
2015-09-01  143
2015-10-01   94

In [62]: df_eom.resample('MS').join(df_bom, lsuffix='_lft')
C:\envs\py35\Scripts\ipython:1: FutureWarning: .resample() is now a deferred operation
use .resample(...).mean() instead of .resample(...)
Out[62]:
            val_lft  val
2015-01-01       87   76
2015-02-01       16   27
2015-03-01       85   65
2015-04-01        4   71
2015-05-01       37    9
2015-06-01       63   23
2015-07-01        3   52
2015-08-01       73   10
2015-09-01       81   62
2015-10-01       69   25

alternative approach - merging DF's by year and month parts: 替代方法 - 按yearmonth合并DF:

In [69]: %paste
(pd.merge(df_bom, df_eom,
          left_on=[df_bom.index.year, df_bom.index.month],
          right_on=[df_eom.index.year, df_eom.index.month],
          suffixes=('_bom','_eom')))
## -- End pasted text --
Out[69]:
   key_0  key_1  val_bom  val_eom
0   2015      1       76       87
1   2015      2       27       16
2   2015      3       65       85
3   2015      4       71        4
4   2015      5        9       37
5   2015      6       23       63
6   2015      7       52        3
7   2015      8       10       73
8   2015      9       62       81
9   2015     10       25       69

Setup: 设定:

In [59]: df_bom = pd.DataFrame({'val':np.random.randint(0,100, 10)}, index=pd.date_range('2015-01-01', periods=10, freq='MS'))

In [60]: df_eom = pd.DataFrame({'val':np.random.randint(0,100, 10)}, index=pd.date_range('2015-01-01', periods=10, freq='M'))

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 我可以减去两个轴上具有不同索引的两个 pandas 数据帧吗? - Can I subtract two pandas dataframes with different indexes in both axes? 如何在具有不同索引和非唯一键的两个pandas数据帧中连接列 - How do I concatenate columns in two pandas dataframes with different indexes and non-unique keys 如何从熊猫的一系列数据框中删除空数据框? - How can I remove empty dataframes from a series of dataframes in pandas? 如何在熊猫中将这一系列数据帧转换为时间序列? - How do I turn this series of dataframes into a time series in Pandas? 如果 Pandas Series 使用 numpy,我如何在 Pandas Series 中有不同的类型? - How can I have different types in Pandas Series if Pandas Series uses numpy? 如何重载__eq__以比较pandas DataFrame和Series? - How do I overload `__eq__` to compare pandas DataFrames and Series? 联接熊猫数据框,其中索引具有不同数量的有序行 - Joining pandas dataframes where indexes have different number of ordered rows 如何加入 pandas 中具有不同行数和不同列的两个数据帧? - How can I join two dataframes in pandas that have different no of rows and different columns? 如何在不合并索引的情况下连接两个具有不同多索引的数据帧? - How can I concatenate two dataframes with different multi-indexes without merging indexes? 如何将包含不同位置的时间序列数据的多个 Pandas 数据帧合并到一个 X 数组中? - How can I combine multiple Pandas dataframes that contain time series data for different locations, into a single X-array?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM