![](/img/trans.png)
[英]Pandas left merge keeping data in right dataframe on duplicte columns
[英]Merge right dataframe into left dataframe, preferring values from right dataframe and keeping new rows
我怎样才能写出相当于纯 Python 的 pandas
left: dict[str, dict] = ... # some rows keyed by KEY
right: dict[str, dict] = ... # more rows keyed by KEY
merge_cols: list[str] = ... # the columns that should be written into left from right
for key, row in right.items():
if key not in left:
left[key] = row
else:
for col in merge_cols:
left[key][col] = row[col]
这样,给定:
merge_cols = ['col']
ldf = pd.DataFrame({'col': [ 3, 4, 5], 'no':['foo', 'foo', 'bar']}, index=[1,2,3])
col no
1 3 foo
2 4 foo
3 5 bar
rdf = pd.DataFrame({'col': [-2, -4, -7]}, index=[3, 4, 5])
col
3 -2
4 -4
5 -7
结果是 dataframe:
col no
1 3.0 foo
2 4.0 foo
3 -2.0 bar
4 -4.0 NaN
5 -7.0 NaN
可能还有其他方法可以做到这一点,但我发现了一种似乎效果很好的方法。
首先,将匹配行的列复制到左侧 dataframe 并进行更新:
ldf.update(rdf[shared_cols]) # a mutating operation
然后,找到索引之间的差异,以便您可以 append 剩余的行:
new_row_indices = list(set(rdf.index) - set(ldf.index))
ldf = ldf.append(rdf.loc[new_row_indices])
另一种选择是先组合,在组合之前用 NaN 替换 ldf 中匹配的索引位置:
ldf.loc[ldf.index.intersection(rdf.index), merge_cols] = np.nan
ldf.combine_first(rdf)
col no
1 3.0 foo
2 4.0 foo
3 -2.0 bar
4 -4.0 NaN
5 -7.0 NaN
update
选项做同样的事情,所以这只是一个替代方案。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.