[英]Python/Pandas: Compare multiple columns in two dataframes and remove row if no matches found
I am learning Python with Pandas and trying to work out the most efficient way to compare multiple selected columns on 2 dataframes to find a match. 我正在使用Pandas学习Python,并尝试找出最有效的方法来比较2个数据帧上的多个选定列以找到匹配项。 For example, if I have the following two dataframes:
例如,如果我有以下两个数据框:
Frame 1
A B C D E F
001 10 0 0 10 0 10
Frame 2
A B C D E F
200 10 0 10 0 10 0
201 0 10 10 0 0 10
202 0 10 0 0 0 0
203 0 0 0 10 0 10
I'm looking for a way to compare columns A
, B
, C
, D
in the 2 dataframes in order to drop rows which do not match 10
in any column. 我正在寻找一种比较2个数据框中的
A
, B
, C
, D
列A
,以便删除在任何列中都不匹配10
的行。
In this case, I would expect it to drop rows 201
and 202
because there are no matches, where row 200
and 203
had 1 match (even though row 200
has 1 column that does not match). 在这种情况下,我希望它删除第
201
和202
行,因为没有匹配项,第200
和203
行有1个匹配项(即使第200
行有1个不匹配的列)。
I've tried looping through all the rows in Frame 2, compare 我尝试遍历第2帧中的所有行,比较
letters = ['A', 'B', 'C', 'D']
for ix, row in frame_2():
for letter in letters:
if frame_1[letter].values[0] != frame_2.loc[ix, letter]:
frame_2.drop(ix, inplace=True)
break
This removed some rows but not all. 这删除了一些行,但不是全部。
Is there an efficient way to loop through all the rows and check if there's a single match in any of the columns of another dataframe? 有没有一种有效的方法可以遍历所有行并检查另一个数据框的任何列中是否有单个匹配项?
Thanks in advance for the help! 先谢谢您的帮助!
I think simpliest solution is replace non 10
to one value in df1
and another value in df2
, compare each column with isin
for possible compare more values if df1
has more rows, create boolean
DataFrame, concat
and filter by any
for test at least one True
per row: 我认为最简单的解决方案是将
df1
一个非10
值替换为df2
另一个值,将每列与isin
进行比较,以便在df1
具有更多行的情况下比较更多值,创建boolean
DataFrame, concat
并按any
进行过滤以测试至少一个True
每行:
letters = ['A', 'B', 'C', 'D']
out = []
for letter in letters:
m = df2[letter].mask(lambda x: x!=10, 0).isin(df1[letter].mask(lambda x: x!=10, 1))
out.append(m)
df = df2[pd.concat(out, axis=1).any(axis=1)]
Alternative solution: 替代解决方案:
df = df2[np.logical_or.reduce(out)]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.