[英]append row values from one df to another if no duplicates in pandas
I have theses two dfs我有这两个 df
df1 = pd.DataFrame({'pupil': ["sarah", "john", "fred"],
'class': ["1a", "1a", "1a"]})
df2 = pd.DataFrame({'pupil_mixed': ["sarah", "john", "lex"],
'class': ["1a", "1c", "1a"]})
I want to append the row values from the column "pupil_mixed" from df2 to the column "pupil" in df1 if the values are no duplicates如果值不重复,我想将 append 从 df2 的“pupil_mixed”列到 df1 中的“pupil”列的行值
desired outcome:期望的结果:
df1 = pd.DataFrame({'pupil': ["sarah", "john", "fred", 'lex'],
'class': ["1a", "1a", "1a", NaN]})
I used append
with loc
我用
append
和loc
df1 = df1.append(df2.loc[df2['pupil_mixed'] != df1['pupil'] ])
which just appended the other column to the df with the matching row value and changed the non matching row values to NaN它只是将另一列附加到具有匹配行值的 df,并将不匹配的行值更改为 NaN
pupil class pupil_mixed
0 sarah 1a NaN
1 john 1a NaN
2 fred 1a NaN
2 NaN 1a lex
You could use concat + drop_duplicates :您可以使用concat + drop_duplicates :
res = pd.concat((df1, df2['pupil_mixed'].to_frame('pupil'))).drop_duplicates('pupil')
print(res)
Output Output
pupil class
0 sarah 1a
1 john 1a
2 fred 1a
2 lex NaN
As an alternative you could filter first (with isin ) and then concat:作为替代方案,您可以先过滤(使用isin )然后连接:
# filter the rows in df2, rename the column pupil_mixed
filtered = df2.loc[~df2['pupil_mixed'].isin(df1['pupil'])]
# create a new single column DataFrame with the pupil column
res = pd.concat((df1, filtered['pupil_mixed'].to_frame('pupil')))
print(res)
Both solutions use to_frame , with the name parameter, effectively changing the column name.两种解决方案都使用to_frame和 name 参数,有效地更改列名。
# distinct df1 & df2
df1['tag'] = 1
df2['tag'] = 2
# change the column name the same
df2.columns = df1.columns
df1 = df1.append(df2)
# drop_duplicates by keep df1
df1 = df1.drop_duplicates('pupil', keep='first')
# set tag == 2, class is null
cond = df1['tag'] == 2
df1.loc[cond, 'class'] = np.nan
del df1['tag']
print(df1)
output: output:
print(df1)
pupil class
0 sarah 1a
1 john 1a
2 fred 1a
3 lex NaN
You could use a merge, after renaming pupil_mixed
in df2:在 df2 中重命名
pupil_mixed
后,您可以使用合并:
df1.merge(df2["pupil_mixed"].rename("pupil"), how="outer")
pupil class
0 sarah 1a
1 john 1a
2 fred 1a
3 lex NaN
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.