简体   繁体   English

使用 Pandas 将特定列值替换为另一个数据框列值

[英]Replace specific column values with another dataframe column value using Pandas

There are 2 dataframes, the 1st dataframe contains 3 columns emp_id,emp_name and old email_id.有 2 个数据框,第一个数据框包含 3 列 emp_id、emp_name 和旧的 email_id。 While the 2nd dataframe contains 2 columns emp_id and new email_id.而第二个数据框包含 2 列 emp_id 和新的 email_id。

Task is to, lookup 2nd dataframe and replace certain values in the 1st dataframe.任务是查找第二个数据帧并替换第一个数据帧中的某些值。

Input Dataframe:输入数据框:

import pandas as pd
data = { 
'emp_id': [111, 222, 333, 444, 555],
'emp_name': ['Sam','Joe','Jake','Rob','Matt'],
'email_id': ['one@gmail.com','two@gmail.com','three@gmail.com','four@gmail.com','five@gmail.com']
}
data = pd.DataFrame(data)

Current Output:电流输出:

     emp_id  emp_name email_id
0     111    Sam      one@gmail.com
1     222    Joe      two@gmail.com
2     333    Jake     three@gmail.com
3     444    Rob      four@gmailcom
4     555    Matt     five@gmail.com

#replace certain values with new values by looking up another dataframe
data1 = { 
'emp_id': [111, 333, 555],
'email_id': ['one@yahoo.com','three@yahoo.com','five@yahoo.com']
}
data1 = pd.DataFrame(data1)

Desired Output:期望输出:

 data = 

     emp_id  emp_name email_id
0     111    Sam      one@yahoo.com
1     222    Joe      two@gmail.com
2     333    Jake     three@yahoo.com
3     444    Rob      four@gmailcom
4     555    Matt     five@yahoo.com

Original data contains 50k+ rows so merging them doesn't seem like the correct option.原始数据包含 50k+ 行,因此合并它们似乎不是正确的选择。 Any help would be appreciated.任何帮助,将不胜感激。

Cheers !干杯!

Let us try update让我们尝试update

out = data.set_index('emp_id')
out.update(data1.set_index('emp_id')[['email_id']])
out.reset_index(inplace=True)
out
   emp_id         email_id
0     111    one@yahoo.com
1     222    two@gmail.com
2     333  three@yahoo.com
3     444   four@gmail.com
4     555   five@yahoo.com

The solution I will provide makes use of Panda's Boolean Indexing .我将提供的解决方案使用Panda 的 Boolean Indexing

# Compare the column of interest of the DataFrames. Returns a Pandas Series with True/False based on the inequality operation.
difference = (data1['email_id'] != data2['email_id'])

# Get the indexes of the different rows, i.e the rows that hold True for the inequality.
indexes_to_be_changed = data1[difference].index

# Replace the rows of the first DataFrame with the rows of the second DataFrame.
data1.iloc[indexes_to_be_changed] = data2.iloc[indexes_to_be_changed]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用Pandas替换数据框列中的特定值 - Replace specific values in a dataframe column using Pandas 替换与另一列 pandas 中的特定值相对应的列中的空值 - Replace null values in a column corresponding to specific value in another column pandas 用熊猫中的列平均值替换数据框中的特定值 - Replace specific values in a dataframe by column mean in pandas 用另一列中的值替换数据框中的所有特定值 - Replace all specific values in dataframe with value in another column Pandas 将列的值替换为与另一个 Dataframe 的比较 - Pandas replace values of a column with comparison to another Dataframe 将 DataFrame 中某些列和行的值替换为同一 dataframe 和 Pandas 中的另一列的值 - Replace values of certain column and rows in a DataFrame with the value of another column in the same dataframe with Pandas 根据另一列的值替换Pandas数据框的Column的值 - Replace values of a Pandas dataframe's Column based on values of another column 使用“替换”替换 pandas dataframe 中的列值 - Replacing column values in pandas dataframe using "replace' 用另一列 Pandas DataFrame 替换一列中的值 - Replace values from one column with another column Pandas DataFrame 如何在给定条件的情况下用 pandas dataframe 中的另一列替换一列的值 - How to replace the values of a column with another column in pandas dataframe given a condition
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM