如何使用 Pandas 在某些行中加入两个更新数据帧？

Question

I'm new to pandas and I would like to know how I can join two files and update existing lines, taking into account a specific column.我是 pandas 的新手，我想知道如何加入两个文件并更新现有行，同时考虑到一个特定的列。 The files have thousands of lines.这些文件有数千行。 For example:例如：

Df_1: Df_1：
```
 AB C D 1 2 5 4 2 2 6 8 9 2 2 1
```

Now, my table 2 has exactly the same columns, and I want to join the two tables replacing some rows that may be in this table and also in table 1 but where there were changes / updates in column C, and add the new lines that exist in this second table (df_2), for example:现在，我的表 2 具有完全相同的列，我想加入这两个表，替换可能在该表和表 1 中但在 C 列中发生更改/更新的一些行，并添加新行存在于第二个表 (df_2) 中，例如：

Df_2: Df_2：
```
 AB C D 2 2 7 8 9 2 3 1 3 4 6 7 1 2 3 4
```

So, the result I want is the union of the two tables and their update in a few rows, in a specific column, like this:所以，我想要的结果是两个表的并集以及它们在几行中的更新，在一个特定的列中，如下所示：

Df_result: df_结果：

 AB C D 1 2 5 4 2 2 7 8 9 2 3 1 3 4 6 7 1 2 3 4

How can I do this with the merge or concatenate function?如何通过合并或连接 function 来做到这一点？ Or is there another way to get the result I want?还是有其他方法可以获得我想要的结果？

Thank you!谢谢！

Answer 1

You need to have at least one column as a reference, I mean, to know what needs to change to do the update.我的意思是，您需要至少有一列作为参考，以了解需要更改哪些内容才能进行更新。

Assuming that in your case it is "A" and "B" in this case.假设在您的情况下它是“A”和“B”。

import pandas as pd
ref = ['A','B']
df_result = pd.concat([df_1, df_2], ignore_index = True)
df_result = df_result.drop_duplicates(subset=ref, keep='last')

Here a real example.这里是一个真实的例子。

d = {'col1': [1, 2, 3], 'col2': ["a", "b", "c"], 'col3': ["aa", "bb", "cc"]}
df1 = pd.DataFrame(data=d)
d = {'col1': [1, 4, 5], 'col2': ["a", "d", "f"], 'col3': ["dd","ee", "ff"]}
df2 = pd.DataFrame(data=d)

df_result = pd.concat([df1, df2], ignore_index=True)

df_result = df_result.drop_duplicates(subset=['col1','col2'], keep='last')
df_result

如何使用 Pandas 在某些行中加入两个更新数据帧？

问题描述

1 个解决方案

解决方案1
0 已采纳 2020-05-22 11:38:09

如何使用 Pandas 在某些行中加入两个更新数据帧？

问题描述

1 个解决方案

解决方案1 0 已采纳 2020-05-22 11:38:09

解决方案1
0 已采纳 2020-05-22 11:38:09