[英]how to update a data frame based on the condition in another data frame in pandas
I have two data frame and I want to update one column of df_source
based on the condition in both data frames:我有两个数据框,我想根据两个数据框中的条件更新一列df_source
:
df_source = pd.Dataframe({'Sentiment':['neg', 'neg','pos'], 'text': ['hello ', '12where', 'here [null]'], 'pred': ['neu', 'neg', 'pos')})
df2 = pd.Dataframe({'Sentiment':['pos', 'neg','pos', 'neu'], 'text': ['hello ', '12 where I', 'hello g* ', 'here [null]'], 'pred': ['neu', 'neg', 'neu', 'neu')})
I want to update the column of Sentiment
in df_source
based on this condition: if the text in both data frame were exactly the same and the pred column was the same then replace the sentiment in df_source with the sentiment in df2我想根据此条件更新df_source
的Sentiment
列:如果两个数据框中的文本完全相同并且 pred 列相同,则将 df_source 中的情绪替换为 df2 中的情绪
So the output would be like this (as only one sample meets both condition "hello "):所以输出将是这样的(因为只有一个样本满足两个条件“hello”):
Sentiment. text. pred
pos hello neu
neg 12where neg
pos here [null] pos
What I have done:我做了什么:
df_source['Sentiment'] = df.where(((df['text']== df_source['text']) & (df['pred'] == dfs['pred'])) , df['Sentiment'])
It should work but this raises error ( ValueError: Can only compare identically-labeled Series objects
).它应该可以工作,但这会引发错误( ValueError: Can only compare identically-labeled Series objects
)。
First merge on the two columns and suffix.首先在两列和后缀上合并。
df_source = df_source.merge(df2, how ='left', on =['text', 'pred'], suffixes=('_x', ''))
Replace the NaNs where there was no match using combine_first and then drop the extra merge column使用 combine_first 替换不匹配的 NaN,然后删除额外的合并列
df_source =df_source.assign(Sentiment= df_source['Sentiment'].combine_first(df_source.Sentiment_x) ).drop('Sentiment_x',1)
text pred Sentiment
0 hello neu pos
1 12where neg neg
2 here [null] pos pos
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.