[英]A clean and efficient way to update cells in pandas DataFrames
I am looking for a cleaner way to achieve the following: 我正在寻找一种更清洁的方法来实现以下目标:
I have a DataFrame with certain columns that I want to update if new information arrives. 我有一个带有某些列的DataFrame,如果有新信息到达,我想更新这些列。 This "new information" in for of a pandas
DataFrame
(from a CSV file) can have more or less rows, however, I am only interested in adding 熊猫
DataFrame
“新信息”(来自CSV文件)可以具有更多或更少的行,但是,我只想添加
(Note the missing name " c
" here and the change in "status" for name " a
") (注意缺少名称“
c
”在这里和名称“在“状态”的变化a
”)
Now, I wrote the following "inconvenient" code to update the original DataFrame with the new information 现在,我编写了以下“不便”代码,用新信息更新了原始DataFrame。
for idx,row in df_base.iterrows():
if not df_upd[df_upd['name'] == row['name']].empty:
df_base.loc[idx, 'status'] = df_upd.loc[df_upd['name'] == row['name'], 'status'].values
It achieves exactly what I want, but it just does neither look nice nor efficient, and I hope that there might be a cleaner way. 它完全可以达到我想要的效果,但是看起来既不好也不高效,我希望可以有一种更简洁的方法。 I tried the
pd.merge
method, however, the problem is that it would be adding new columns instead of "updating" the cells in that column. 我尝试了
pd.merge
方法,但是问题是它将添加新列而不是“更新”该列中的单元格。
pd.merge(left=df_base, right=df_upd, on=['name'], how='left')
I am looking forward to your tips and ideas. 我期待您的提示和想法。
You could set_index("name")
and then call .update
: 您可以
set_index("name")
然后调用.update
:
>>> df_base = df_base.set_index("name")
>>> df_upd = df_upd.set_index("name")
>>> df_base.update(df_upd)
>>> df_base
status
name
a 0
b 1
c 0
d 1
More generally, you can set the index to whatever seems appropriate, update, and then reset as needed. 通常,您可以将索引设置为任何合适的索引,然后根据需要进行更新和重置。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.