繁体   English   中英

Pandas:合并日期时间索引上的数据帧

[英]Pandas: Merge data frames on datetime index

我有以下两个数据帧,我已将日期设置为 DateTime Index df.set_index(pd.to_datetime(df['date']), inplace=True)并希望在日期合并或加入:

df.head(5)
        catcode_amt type    feccandid_amt   amount
date                
1915-12-31  A5000   24K     H6TX08100   1000
1916-12-31  T6100   24K     H8CA52052   500
1954-12-31  H3100   24K     S8AK00090   1000
1985-12-31  J7120   24E     H8OH18088   36
1997-12-31  z9600   24K     S6ND00058   2000


d.head(5)
         catcode_disp disposition   feccandid_disp  bills
date                
2007-12-31  A0000   support     S4HI00011               1
2007-12-31  A1000   oppose      S4IA00020', 'P20000741  1
2007-12-31  A1000   support     S8MT00010               1
2007-12-31  A1500   support     S6WI00061               2
2007-12-31  A1600   support     S4IA00020', 'P20000741  3

我尝试了以下两种方法,但都返回 MemoryError:

df.join(d, how='right')

我在没有将日期设置为索引的数据帧上使用下面的代码。

merge=pd.merge(df,d, how='inner', on='date')

如果您需要在函数merge按索引合并,您可以添加参数left_index=Trueright_index=True

merge=pd.merge(df,d, how='inner', left_index=True, right_index=True)

示例( d中索引的第一个值已更改以进行匹配):

print df
           catcode_amt type feccandid_amt  amount
date                                             
1915-12-31       A5000  24K     H6TX08100    1000
1916-12-31       T6100  24K     H8CA52052     500
1954-12-31       H3100  24K     S8AK00090    1000
1985-12-31       J7120  24E     H8OH18088      36
1997-12-31       z9600  24K     S6ND00058    2000

print d
           catcode_disp disposition            feccandid_disp  bills
date                                                                
1997-12-31        A0000     support                 S4HI00011    1.0
2007-12-31        A1000      oppose  S4IA00020', 'P20000741 1    NaN
2007-12-31        A1000     support                 S8MT00010    1.0
2007-12-31        A1500     support                 S6WI00061    2.0
2007-12-31        A1600     support  S4IA00020', 'P20000741 3    NaN

merge=pd.merge(df,d, how='inner', left_index=True, right_index=True)
print merge
           catcode_amt type feccandid_amt  amount catcode_disp disposition  \
date                                                                         
1997-12-31       z9600  24K     S6ND00058    2000        A0000     support   

           feccandid_disp  bills  
date                              
1997-12-31      S4HI00011    1.0  

或者你可以使用concat

print pd.concat([df,d], join='inner', axis=1)

date                                                                         
1997-12-31       z9600  24K     S6ND00058    2000        A0000     support   

           feccandid_disp  bills  
date                              
1997-12-31      S4HI00011    1.0  

编辑: EdChum是对的:

我向 DataFrame df添加重复项(索引中的最后 2 个值):

print df
           catcode_amt type feccandid_amt  amount
date                                             
1915-12-31       A5000  24K     H6TX08100    1000
1916-12-31       T6100  24K     H8CA52052     500
1954-12-31       H3100  24K     S8AK00090    1000
2007-12-31       J7120  24E     H8OH18088      36
2007-12-31       z9600  24K     S6ND00058    2000

print d
           catcode_disp disposition            feccandid_disp  bills
date                                                                
1997-12-31        A0000     support                 S4HI00011    1.0
2007-12-31        A1000      oppose  S4IA00020', 'P20000741 1    NaN
2007-12-31        A1000     support                 S8MT00010    1.0
2007-12-31        A1500     support                 S6WI00061    2.0
2007-12-31        A1600     support  S4IA00020', 'P20000741 3    NaN

merge=pd.merge(df,d, how='inner', left_index=True, right_index=True)
print merge
           catcode_amt type feccandid_amt  amount catcode_disp disposition  \
date                                                                         
2007-12-31       J7120  24E     H8OH18088      36        A1000      oppose   
2007-12-31       J7120  24E     H8OH18088      36        A1000     support   
2007-12-31       J7120  24E     H8OH18088      36        A1500     support   
2007-12-31       J7120  24E     H8OH18088      36        A1600     support   
2007-12-31       z9600  24K     S6ND00058    2000        A1000      oppose   
2007-12-31       z9600  24K     S6ND00058    2000        A1000     support   
2007-12-31       z9600  24K     S6ND00058    2000        A1500     support   
2007-12-31       z9600  24K     S6ND00058    2000        A1600     support   

                      feccandid_disp  bills  
date                                         
2007-12-31  S4IA00020', 'P20000741 1    NaN  
2007-12-31                 S8MT00010    1.0  
2007-12-31                 S6WI00061    2.0  
2007-12-31  S4IA00020', 'P20000741 3    NaN  
2007-12-31  S4IA00020', 'P20000741 1    NaN  
2007-12-31                 S8MT00010    1.0  
2007-12-31                 S6WI00061    2.0  
2007-12-31  S4IA00020', 'P20000741 3    NaN  

看起来您的日期是您的索引,在这种情况下,您希望合并索引,而不是列。 如果您有两个数据帧, df_1df_2

df_1.merge(df_2, left_index=True, right_index=True, how='inner')

我遇到了类似的问题。 你很可能有很多NaT
我删除了我所有的NaT ,然后执行了连接并能够加入它。

df = df[df['date'].notnull() == True].set_index('date')
d = d[d['date'].notnull() == True].set_index('date')
df.join(d, how='right')

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM