[英]How can I join two dataframes with update in some rows, using Pandas?
I'm new to pandas and I would like to know how I can join two files and update existing lines, taking into account a specific column.我是 pandas 的新手,我想知道如何加入两个文件并更新现有行,同时考虑到一个特定的列。 The files have thousands of lines.
这些文件有数千行。 For example:
例如:
Df_1: Df_1:
AB C D 1 2 5 4 2 2 6 8 9 2 2 1
Now, my table 2 has exactly the same columns, and I want to join the two tables replacing some rows that may be in this table and also in table 1 but where there were changes / updates in column C, and add the new lines that exist in this second table (df_2), for example:现在,我的表 2 具有完全相同的列,我想加入这两个表,替换可能在该表和表 1 中但在 C 列中发生更改/更新的一些行,并添加新行存在于第二个表 (df_2) 中,例如:
Df_2: Df_2:
AB C D 2 2 7 8 9 2 3 1 3 4 6 7 1 2 3 4
So, the result I want is the union of the two tables and their update in a few rows, in a specific column, like this:所以,我想要的结果是两个表的并集以及它们在几行中的更新,在一个特定的列中,如下所示:
Df_result: df_结果:
AB C D 1 2 5 4 2 2 7 8 9 2 3 1 3 4 6 7 1 2 3 4
How can I do this with the merge or concatenate function?如何通过合并或连接 function 来做到这一点? Or is there another way to get the result I want?
还是有其他方法可以获得我想要的结果?
Thank you!谢谢!
You need to have at least one column as a reference, I mean, to know what needs to change to do the update.我的意思是,您需要至少有一列作为参考,以了解需要更改哪些内容才能进行更新。
Assuming that in your case it is "A" and "B" in this case.假设在您的情况下它是“A”和“B”。
import pandas as pd
ref = ['A','B']
df_result = pd.concat([df_1, df_2], ignore_index = True)
df_result = df_result.drop_duplicates(subset=ref, keep='last')
Here a real example.这里是一个真实的例子。
d = {'col1': [1, 2, 3], 'col2': ["a", "b", "c"], 'col3': ["aa", "bb", "cc"]}
df1 = pd.DataFrame(data=d)
d = {'col1': [1, 4, 5], 'col2': ["a", "d", "f"], 'col3': ["dd","ee", "ff"]}
df2 = pd.DataFrame(data=d)
df_result = pd.concat([df1, df2], ignore_index=True)
df_result = df_result.drop_duplicates(subset=['col1','col2'], keep='last')
df_result
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.