根据给定列 pandas 中的缺失值，将行从一个 dataframe 添加到另一个

Question

我一直在寻找答案，但找不到它。 我有两个数据框，一个是target ，另一个backup都具有相同的列。 我想要做的是查看给定的列并将所有行从backup添加到target ，这些行不在target中。 最直接的解决方案是：

import pandas as pd
import numpy as np

target = pd.DataFrame({
         "key1": ["K1", "K2", "K3", "K5"],
         "A": ["A1", "A2", "A3", np.nan],
         "B": ["B1", "B2", "B3", "B5"],
     })

backup = pd.DataFrame({
         "key1": ["K1", "K2", "K3", "K4", "K5"],
         "A": ["A1", "A", "A3", "A4", "A5"],
         "B": ["B1", "B2", "B3", "B4", "B5"],
     })

merged = target.copy()

for item in backup.key1.unique():
    if item not in target.key1.unique():
        merged = pd.concat([merged, backup.loc[backup.key1 == item]])

merged.reset_index(drop=True, inplace=True)

给予

  key1    A   B
0   K1   A1  B1
1   K2   A2  B2
2   K3   A3  B3
3   K5  NaN  B5
4   K4   A4  B4

现在我只使用 pandas 尝试了几件事，但它们都不起作用。

pandas 连接

# Does not work because it creates duplicate lines and if dropped, the updated rows which are different will not be dropped -- compare the line with A or NaN

pd.concat([target, backup]).drop_duplicates()

  key1  A   B
0   K1  A1  B1
1   K2  A2  B2
2   K3  A3  B3
3   K5  NaN B5
1   K2  A   B2
3   K4  A4  B4
4   K5  A5  B5

pandas 合并

# Does not work because the backup would overwrite data in the target -- NaN

pd.merge(target, backup, how="right")

  key1  A   B
0   K1  A1  B1
1   K2  A   B2
2   K3  A3  B3
3   K4  A4  B4
4   K5  A5  B5

重要的是，它不是这篇文章的副本，因为我不想有一个新列，更重要的是， target中的值不是NaN ，它们根本不存在。 此外，如果那时我将使用建议的合并列， target中的NaN将被backup中不需要的值替换。
它不是使用combine_first pandas 的这篇文章的副本，因为在这种情况下， NaN由来自backup的值填充，这是错误的：

target.combine_first(backup)

   key1 A   B
0   K1  A1  B1
1   K2  A2  B2
2   K3  A3  B3
3   K5  A4  B5
4   K5  A5  B5

最后，

target.join(backup, on=["key1"])

让我很烦

ValueError: You are trying to merge on object and int64 columns. If you wish to proceed you should use pd.concat

我真的没有得到，因为它们都是纯字符串，并且建议的解决方案不起作用。

所以我想问一下，我错过了什么？ 如何使用一些pandas方法来做到这一点？ 非常感谢。

Answer 1

在boolean indexing中使用由target.key1过滤的Series.isin中不存在的过滤backup行的concat ：

merged = pd.concat([target, backup[~backup.key1.isin(target.key1)]])
print (merged)
  key1    A   B
0   K1   A1  B1
1   K2   A2  B2
2   K3   A3  B3
3   K5  NaN  B5
3   K4   A4  B4

Answer 2

也许您可以在df.drop_duplicates()中使用“子集”参数来试试这个？

pd.concat([target, backup]).drop_duplicates(subset = "key1")

这给出了 output：

  key1    A   B
0   K1   A1  B1
1   K2   A2  B2
2   K3   A3  B3
3   K5  NaN  B5
3   K4   A4  B4

根据给定列 pandas 中的缺失值，将行从一个 dataframe 添加到另一个

问题描述

2 个解决方案

解决方案1
2 已采纳 2021-01-18 12:50:42

解决方案2
0 2021-01-18 12:55:50

根据给定列 pandas 中的缺失值，将行从一个 dataframe 添加到另一个

问题描述

2 个解决方案

解决方案1 2 已采纳 2021-01-18 12:50:42

解决方案2 0 2021-01-18 12:55:50

解决方案1
2 已采纳 2021-01-18 12:50:42

解决方案2
0 2021-01-18 12:55:50