[英]Pandas DataFrame update from another dataframe only if a column does not have a value
I am trying to update a dataframe using the values in another dataframe but I would like the update to happen only if a particular column does not have a value.我正在尝试使用另一个 dataframe 中的值更新 dataframe 但我希望仅在特定列没有值时才进行更新。
from datetime import datetime
import pandas as pd
dr = pd.bdate_range(periods=3, end=datetime.now().date())
df1 = pd.DataFrame([1, 2], columns=['myid'])
for d in dr:
df1[d.to_pydatetime()] = pd.np.nan
df1.loc[df1['myid'] == 1, dr[2]] = 4.0
df1 = df1.set_index('myid')
df1
2019-11-13 00:00:00 2019-11-14 00:00:00 2019-11-15 00:00:00
myid
1 NaN NaN 4.0
2 NaN NaN NaN
df2 = pd.DataFrame([1, 2], columns=['myid'])
for d in dr:
df2[d.to_pydatetime()] = pd.np.nan
df2.loc[df2['myid'] == 2, dr[2]] = 4.0
df2.loc[df2['myid'] == 1, dr[0]] = 6.0
df2 = df2.set_index('myid')
df2
2019-11-13 00:00:00 2019-11-14 00:00:00 2019-11-15 00:00:00
myid
1 6.0 NaN NaN
2 NaN NaN 4.0
I would like to update df1 with values in df2 if df1 does not have a value for dr[2] (current date)
如果 df1 没有
dr[2] (current date)
,我想用 df2 中的值更新 df1
So in the above example only the second row in df1 should get updated.所以在上面的例子中,只有 df1 中的第二行应该被更新。
I tried update
as follows but not sure how to filter based on whether the column has a value or not我尝试如下
update
,但不确定如何根据列是否有值进行过滤
df1.update(df2, overwrite=False)
I did look at filter_func
that update takes but again unable to make this work with it.我确实查看了更新所需的
filter_func
,但再次无法使用它。 Any help is much appreciated.任何帮助深表感谢。 Thanks
谢谢
EDIT:编辑:
Expected output:预期 output:
Row 1 should not be touched because it already has a value in column 2019-11-15 00:00:00
不应触摸第 1 行,因为它在
2019-11-15 00:00:00
列中已有值
df1
2019-11-13 00:00:00 2019-11-14 00:00:00 2019-11-15 00:00:00
myid
1 NaN NaN 4.0
2 NaN NaN 4.0
Update: This seems to be an obvious use for the filter_func
argument.更新:这似乎是
filter_func
参数的明显用途。 Update only rows where all columns of df1
are null:仅更新
df1
的所有列都是 null 的行:
df1.update(df2, filter_func=lambda df: df1.isnull().all(1))
# 2019-11-13 00:00:00 2019-11-14 00:00:00 2019-11-15 00:00:00
#myid
#1 NaN NaN 4.0
#2 NaN NaN 4.0
Old answer, more hands-on:旧答案,更多动手:
You can separate which rows to update, update only those rows then combine.您可以分离要更新的行,仅更新这些行然后合并。
update
operates inplace so we need to split things out. update
就地运行,所以我们需要把事情分开。
m = df1.notnull().any(1)
# These get updated
u = df1[~m].copy()
u.update(df2)
df1 = pd.concat([df1[m], u])
# 2019-11-13 00:00:00 2019-11-14 00:00:00 2019-11-15 00:00:00
#myid
#1 NaN NaN 4.0
#2 NaN NaN 4.0
Alternatively, you could use combine_first
, then mask rows that shouldn't have been updated and reset them back to the original values in df1
或者,您可以使用
combine_first
,然后屏蔽不应该更新的行并将它们重置回df1
中的原始值
df1.combine_first(df2).mask(df1.notnull().any(1)).fillna(df1)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.