简体   繁体   English

在python pandas中添加2个数据框

[英]Adding 2 data frame in python pandas

I want to combine 2 seperate data frame of the following shape in Python Pandas: 我想在Python Pandas中组合以下形状的2个单独数据框:

Df1=
       A    B
    1  1    2
    2  3    4
    3  5    6

Df2 = 
       C    D
    1  a    b
    2  c    d
    3  e    f

I want to have as follows: 我想要如下:

df = 
       A    B    C    D
   1   1    2    a    b
   2   3    4    c    d
   3   5    6    e    f

I am using the following code: 我使用以下代码:

dat = df1.join(df2)

But problem is that, In my actual data frame there are more than 2 Million rows and for that it takes too long time and consumes huge memory. 但问题在于,在我的实际数据框架中有超过2百万行,为此它需要太长时间并消耗大量内存。

Is there any way to do it faster and memory efficient? 有没有办法更快和内存效率?

Thank you in advance for helping. 提前谢谢你的帮助。

If I've read your question correctly, your indexes align exactly and you just need to combine columns into a single DataFrame. 如果我已正确阅读您的问题,您的索引会完全对齐,您只需将列组合到一个DataFrame中。 If that's right then it turns out that copying over a column from one DataFrame to another is the fastest way to go ( [92] and [93] ). 如果那是正确的,那么事实证明,从一个DataFrame到另一个DataFrame的列复制是最快的方法( [92][93] )。 f is my DataFrame in the example below: f是以下示例中的我的DataFrame:

In [85]: len(f)
Out[86]: 343720

In [87]: a = f.loc[:, ['date_val', 'price']]
In [88]: b = f.loc[:, ['red_date', 'credit_spread']]

In [89]: %timeit c = pd.concat([a, b], axis=1)
100 loops, best of 3: 7.11 ms per loop

In [90]: %timeit c = pd.concat([a, b], axis=1, ignore_index=True)
100 loops, best of 3: 10.8 ms per loop

In [91]: %timeit c = a.join(b)
100 loops, best of 3: 6.47 ms per loop

In [92]: %timeit a['red_date'] = b['red_date']
1000 loops, best of 3: 1.17 ms per loop

In [93]: %timeit a['credit_spread'] = b['credit_spread']
1000 loops, best of 3: 1.16 ms per loop

I also tried to copy both columns at once but for some strange reason it was more than two times slower than copying each column individually. 我也尝试一次复制两列,但由于某些奇怪的原因,它比单独复制每列慢两倍。

In [94]: %timeit a[['red_date', 'credit_spread']] = b[['red_date', 'credit_spread']]
100 loops, best of 3: 5.09 ms per loop

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM