[英]How to filter out rows of one python pandas dataframe from another dataframe by comparing columns?
I'm trying to exclude rows from one dataframe, which also occur in another dataframe: 我试图从一个数据帧中排除行,这也出现在另一个数据帧中:
import pandas
df = pandas.DataFrame({'A': ['Chr1', 'Chr1', 'Chr1','Chr1', 'Chr1', 'Chr1','Chr2','Chr2'], 'B': [10,20,30,40,50,60,15,20]})
errors = pandas.DataFrame({'A': ['Chr1', 'Chr1'], 'B': [20,50]})
As a result, the rows in df, that are equal to errors should be left out: 因此,应忽略df中等于错误的行:
df:
'A' 'B'
Chr1 10
Chr1 30
Chr1 40
Chr1 60
Chr2 15
Chr2 20
It doesn't seem to work with df.merge, and I don't want to iterate over all rows, since the dataframes get pretty large. 它似乎不适用于df.merge,我不想迭代所有行,因为数据帧变得非常大。
Best, 最好,
David 大卫
Add an extra column to errors 为错误添加额外的列
errors['temp'] = 1
Merge the two dataframes 合并两个数据帧
merged_df = pandas.merge(df,errors,how='outer')
Now keep only those rows which have 'temp' as NaN 现在只保留那些'temp'为NaN的行
merged_df = merged_df[ merged_df['temp'] != 1 ]
del merged_df['temp']
print merged_rdf
A B
0 Chr1 10
2 Chr1 30
3 Chr1 40
5 Chr1 60
6 Chr2 15
7 Chr2 20
您可以执行以下两列操作:
print df[ ~df['A'].isin(errors['A']) | ~df['B'].isin(errors['B']) ]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.