简体   繁体   English

如何根据另一个 DataFrame 中的列更新 Pandas DataFrame 中的列

[英]How to update a column in pandas DataFrame based on column from another DataFrame

Let's say I have 2 DataFrames假设我有 2 个数据帧

df1 = pd.DataFrame({'name': ['Jack', 'Lucy', 'Mark'], 'age': [1, 2, 3]})
df2 = pd.DataFrame({'name': ['Jack', 'Mark'], 'age': [10, 11], 'address': ['addr1', 'addr2']})

What operation should I use to make df1 become我应该用什么操作让df1变成

name    age    address
--------------------
Jack    10     addr1
Lucy    2      NaN
Mark    11     addr2

You could merge both df and then replace missing values :您可以合并两个 df 然后替换缺失值:

df_out = df1.merge(df2,on=['name'],how='left')
df_out['age'] =  df_out.apply(lambda x : x['age_y'] if x['age_y']>0 else x['age_x'],axis = 1)
df_out[['name','age','address']]

Output输出

| name   |   age | address   |
|:-------|------:|:----------|
| Jack   |    10 | addr1     |
| Lucy   |     2 | nan       |
| Mark   |    11 | addr2     |

Use DataFrame.combine_first by name columns converted to index in both DataFrame s:使用DataFrame.combine_firstname列转换为两个DataFrame的索引:

df1 = df1.set_index('name') 
df2 = df2.set_index('name')

df1 = df2.combine_first(df1).reset_index()
print (df1)
   name address   age
0  Jack   addr1  10.0
1  Lucy     NaN   2.0
2  Mark   addr2  11.0

First original solution should be changed:应更改第一个原始解决方案:

df1 = df1.set_index('name')
df2 = df2.set_index('name')
df1 = df1.reindex(df1.columns.union(df2.columns, sort=False), axis=1)

df1.update(df2)
df1 = df1.reset_index()
print (df1)
   name   age address
0  Jack  10.0   addr1
1  Lucy   2.0     NaN
2  Mark  11.0   addr2

Or solution with left join in DataFrame.merge and DataFrame.combine_first :或者在DataFrame.mergeDataFrame.combine_first使用左连接的解决方案:

#left join df2, if existing columns name is added _ to end
df = df1.merge(df2, on='name', how='left', suffixes=('','_'))

#filter columns names
new_cols = df.columns[df.columns.str.endswith('_')]

#remove last char from column names
orig_cols = new_cols.str[:-1]
#dictionary for rename
d = dict(zip(new_cols, orig_cols))

#filter columns and replace NaNs by new appended columns
df[orig_cols] = df[new_cols].rename(columns=d).combine_first(df[orig_cols])
#remove appended columns 
df = df.drop(new_cols, axis=1)
print (df)
   name   age address
0  Jack  10.0   addr1
1  Lucy   2.0     NaN
2  Mark  11.0   addr2

您可以使用 concat、drop_duplicates、sort_index 和 reset_index

df = pd.concat([df1,df2],ignore_index=False, sort=False).drop_duplicates(["name"], keep="last").sort_index().reset_index(drop=True)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 基于同一 dataframe 的另一列更新 pandas dataframe 中的列 - Update column in pandas dataframe based on another column of the same dataframe 如何基于多列从另一个 dataframe 中提取 pandas dataframe? - how to extract pandas dataframe from another dataframe based on multiple column? 熊猫:根据另一个数据框中的索引更新列 - Pandas: update a column based on index in another dataframe 如何根据另一列的时间将列添加到pandas数据框 - How to add a column to pandas dataframe based on time from another column 如何查找两个 pandas dataframe 并从另一个 dataframe 列更新第一个 dataframe 列中的值? - how to do lookup on two pandas dataframe and update its value in first dataframe column from another dataframe column? Pandas dataframe 条件列更新基于另一个 dataframe - Pandas dataframe conditional column update based on another dataframe 如何基于另一个DataFrame中的列在Pandas DataFrame中创建新列? - How to create a new column in a Pandas DataFrame based on a column in another DataFrame? 如何根据从另一列之间选择熊猫数据框中的行 - how to select rows in pandas dataframe based on between from another column 根据其他 dataframe 的列更新 pandas dataframe - update pandas dataframe based on column from other dataframe 当来自熊猫中另一个数据框的键匹配时更新数据框的列 - Update column of a dataframe when key matches from another dataframe in pandas
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM