简体   繁体   English

在熊猫中加入或合并覆盖

[英]join or merge with overwrite in pandas

I want to perform a join/merge/append operation on a dataframe with datetime index.我想对具有日期时间索引的数据帧执行连接/合并/追加操作。

Let's say I have df1 and I want to add df2 to it.假设我有df1并且我想向其中添加df2 df2 can have fewer or more columns, and overlapping indexes. df2可以有更少或更多的列,以及重叠的索引。 For all rows where the indexes match, if df2 has the same column as df1 , I want the values of df1 be overwritten with those from df2 .对于索引匹配的所有行,如果df2df1具有相同的列,我希望df1的值被df2的值覆盖。

How can I obtain the desired result?我怎样才能获得想要的结果?

How about: df2.combine_first(df1) ?怎么样: df2.combine_first(df1)

In [33]: df2
Out[33]: 
                   A         B         C         D
2000-01-03  0.638998  1.277361  0.193649  0.345063
2000-01-04 -0.816756 -1.711666 -1.155077 -0.678726
2000-01-05  0.435507 -0.025162 -1.112890  0.324111
2000-01-06 -0.210756 -1.027164  0.036664  0.884715
2000-01-07 -0.821631 -0.700394 -0.706505  1.193341
2000-01-10  1.015447 -0.909930  0.027548  0.258471
2000-01-11 -0.497239 -0.979071 -0.461560  0.447598

In [34]: df1
Out[34]: 
                   A         B         C
2000-01-03  2.288863  0.188175 -0.040928
2000-01-04  0.159107 -0.666861 -0.551628
2000-01-05 -0.356838 -0.231036 -1.211446
2000-01-06 -0.866475  1.113018 -0.001483
2000-01-07  0.303269  0.021034  0.471715
2000-01-10  1.149815  0.686696 -1.230991
2000-01-11 -1.296118 -0.172950 -0.603887
2000-01-12 -1.034574 -0.523238  0.626968
2000-01-13 -0.193280  1.857499 -0.046383
2000-01-14 -1.043492 -0.820525  0.868685

In [35]: df2.comb
df2.combine        df2.combineAdd     df2.combine_first  df2.combineMult    

In [35]: df2.combine_first(df1)
Out[35]: 
                   A         B         C         D
2000-01-03  0.638998  1.277361  0.193649  0.345063
2000-01-04 -0.816756 -1.711666 -1.155077 -0.678726
2000-01-05  0.435507 -0.025162 -1.112890  0.324111
2000-01-06 -0.210756 -1.027164  0.036664  0.884715
2000-01-07 -0.821631 -0.700394 -0.706505  1.193341
2000-01-10  1.015447 -0.909930  0.027548  0.258471
2000-01-11 -0.497239 -0.979071 -0.461560  0.447598
2000-01-12 -1.034574 -0.523238  0.626968       NaN
2000-01-13 -0.193280  1.857499 -0.046383       NaN
2000-01-14 -1.043492 -0.820525  0.868685       NaN

Note that it takes the values from df1 for indices that do not overlap with df2 .请注意,对于不与df2重叠的索引,它从df1获取df2 If this doesn't do exactly what you want I would be willing to improve this function / add options to it.如果这不能完全满足您的要求,我愿意改进此功能/为其添加选项。

For a merge like this, the update method of a DataFrame is useful.对于这样的合并,DataFrame 的update方法很有用。

Taking the examples from the documentation :文档中获取示例:

import pandas as pd
import numpy as np

df1 = pd.DataFrame([[np.nan, 3., 5.], [-4.6, 2.1, np.nan],
                   [np.nan, 7., np.nan]])
df2 = pd.DataFrame([[-42.6, np.nan, -8.2], [-5., 1.6, 4]],
                   index=[1, 2])

Data before the update : update前数据:

>>> df1
     0    1    2
0  NaN  3.0  5.0
1 -4.6  2.1  NaN
2  NaN  7.0  NaN
>>>
>>> df2
      0    1    2
1 -42.6  NaN -8.2
2  -5.0  1.6  4.0

Let's update df1 with data from df2 :让我们用来自df2数据更新df1

df1.update(df2)

Data after the update:更新后数据:

>>> df1
      0    1    2
0   NaN  3.0  5.0
1 -42.6  2.1 -8.2
2  -5.0  1.6  4.0

Remarks:评论:

  • It's important to notice that this is an operation "in place", modifying the DataFrame that calls update .需要注意的是,这是一个“就地”操作,修改调用update的 DataFrame 。
  • Also note that non NaN values in df1 are not overwritten with NaN values in df2另请注意, df1中的非 NaN 值不会被df2 NaN 值覆盖

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM