[英]join or merge with overwrite in pandas
I want to perform a join/merge/append operation on a dataframe with datetime index.我想对具有日期时间索引的数据帧执行连接/合并/追加操作。
Let's say I have df1
and I want to add df2
to it.假设我有
df1
并且我想向其中添加df2
。 df2
can have fewer or more columns, and overlapping indexes. df2
可以有更少或更多的列,以及重叠的索引。 For all rows where the indexes match, if df2
has the same column as df1
, I want the values of df1
be overwritten with those from df2
.对于索引匹配的所有行,如果
df2
与df1
具有相同的列,我希望df1
的值被df2
的值覆盖。
How can I obtain the desired result?我怎样才能获得想要的结果?
How about: df2.combine_first(df1)
?怎么样:
df2.combine_first(df1)
?
In [33]: df2
Out[33]:
A B C D
2000-01-03 0.638998 1.277361 0.193649 0.345063
2000-01-04 -0.816756 -1.711666 -1.155077 -0.678726
2000-01-05 0.435507 -0.025162 -1.112890 0.324111
2000-01-06 -0.210756 -1.027164 0.036664 0.884715
2000-01-07 -0.821631 -0.700394 -0.706505 1.193341
2000-01-10 1.015447 -0.909930 0.027548 0.258471
2000-01-11 -0.497239 -0.979071 -0.461560 0.447598
In [34]: df1
Out[34]:
A B C
2000-01-03 2.288863 0.188175 -0.040928
2000-01-04 0.159107 -0.666861 -0.551628
2000-01-05 -0.356838 -0.231036 -1.211446
2000-01-06 -0.866475 1.113018 -0.001483
2000-01-07 0.303269 0.021034 0.471715
2000-01-10 1.149815 0.686696 -1.230991
2000-01-11 -1.296118 -0.172950 -0.603887
2000-01-12 -1.034574 -0.523238 0.626968
2000-01-13 -0.193280 1.857499 -0.046383
2000-01-14 -1.043492 -0.820525 0.868685
In [35]: df2.comb
df2.combine df2.combineAdd df2.combine_first df2.combineMult
In [35]: df2.combine_first(df1)
Out[35]:
A B C D
2000-01-03 0.638998 1.277361 0.193649 0.345063
2000-01-04 -0.816756 -1.711666 -1.155077 -0.678726
2000-01-05 0.435507 -0.025162 -1.112890 0.324111
2000-01-06 -0.210756 -1.027164 0.036664 0.884715
2000-01-07 -0.821631 -0.700394 -0.706505 1.193341
2000-01-10 1.015447 -0.909930 0.027548 0.258471
2000-01-11 -0.497239 -0.979071 -0.461560 0.447598
2000-01-12 -1.034574 -0.523238 0.626968 NaN
2000-01-13 -0.193280 1.857499 -0.046383 NaN
2000-01-14 -1.043492 -0.820525 0.868685 NaN
Note that it takes the values from df1
for indices that do not overlap with df2
.请注意,对于不与
df2
重叠的索引,它从df1
获取df2
。 If this doesn't do exactly what you want I would be willing to improve this function / add options to it.如果这不能完全满足您的要求,我愿意改进此功能/为其添加选项。
For a merge like this, the update
method of a DataFrame is useful.对于这样的合并,DataFrame 的
update
方法很有用。
Taking the examples from the documentation :从文档中获取示例:
import pandas as pd
import numpy as np
df1 = pd.DataFrame([[np.nan, 3., 5.], [-4.6, 2.1, np.nan],
[np.nan, 7., np.nan]])
df2 = pd.DataFrame([[-42.6, np.nan, -8.2], [-5., 1.6, 4]],
index=[1, 2])
Data before the update
: update
前数据:
>>> df1
0 1 2
0 NaN 3.0 5.0
1 -4.6 2.1 NaN
2 NaN 7.0 NaN
>>>
>>> df2
0 1 2
1 -42.6 NaN -8.2
2 -5.0 1.6 4.0
Let's update df1
with data from df2
:让我们用来自
df2
数据更新df1
:
df1.update(df2)
Data after the update:更新后数据:
>>> df1
0 1 2
0 NaN 3.0 5.0
1 -42.6 2.1 -8.2
2 -5.0 1.6 4.0
Remarks:评论:
update
.update
的 DataFrame 。df1
are not overwritten with NaN values in df2
df1
中的非 NaN 值不会被df2
NaN 值覆盖
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.