[英]Selecting rows from DF1 where column values match values from a column from DF2
This problem has been solved (I think).这个问题已经解决了(我认为)。 Excel was the problem and not python after all.
Excel 是问题所在,毕竟不是 python。 The below code should work for my needs and doesn't seem to be dropping rows after all.
下面的代码应该可以满足我的需要,并且毕竟似乎不会删除行。
Rows Highlighted in yellow are the rows I want to select in DF1.以黄色突出显示的行是我想要在 DF1 中 select 的行。 The selection should be made based on the values in column_2 of DF1 that match the values of column_1 of DF2
应根据 DF1 的 column_2 中与 DF2 的 column_1 的值匹配的值进行选择
Here was my preferred solution using Pandas package in python after a lot of trail and error/searching:这是我在经过大量跟踪和错误/搜索后在 python 中使用 Pandas package 的首选解决方案:
NEW_MATCHED_DF1 = DF1.loc[DF1['column 2'].isin(DF2['column_1'])]
The problem I am seeing is that when I compare my results to what happens in excel when I do the same thing, I am getting almost double the results and I think that my python technique is dropping duplicates.我看到的问题是,当我将我的结果与 excel 中发生的情况进行比较时,当我做同样的事情时,我得到的结果几乎翻了一番,我认为我的 python 技术正在删除重复项。 Of course, it is possible that I am doing something wrong in excel, or excel is incorrect for some other reason, but it is something I have verified in the past and much more familiar with excel so I am suspecting that it is more likely that I am doing something wrong in python.
当然,有可能我在 excel 中做错了,或者 excel 由于某些其他原因不正确,但这是我过去验证过的,并且对 ZBF57C906FA7D2BB66D67372E41585 更熟悉所以我怀疑它更可能是我在 python 做错了什么。 EXCEL IS THE PROBLEM AFTER ALL:!
EXCEL 毕竟是问题:! :/
:/
Ultimately, I would like to use python to select any and all rows in DF1 where column_2 of DF1 matches column_1 of DF2.最终,我想使用 python 到 select DF1 中的任何和所有行,其中 DF1 的 column_2 与 DF2 的 column_1 匹配。 Excel is absurdly slow and I would like to move away from using excel for manipulating large dataframes.
Excel 速度非常慢,我想放弃使用 excel 来操作大型数据帧。
I appreciate any help or directions to help.我感谢任何帮助或帮助指示。 I really haven't been able to figure out if my code is in fact dropping duplicates and/or if there is another solution that I can be confident that wont do this.
我真的无法弄清楚我的代码是否实际上正在删除重复项和/或是否有另一种我可以确信不会这样做的解决方案。
Try this using np.where
:尝试使用
np.where
:
import numpy as np
list_df2 = df2['column1'].unique().tolist()
df1['matching_rows'] = np.where(df1['column2'].isin(list_df2),'Match','No Match')
And then create a new dataframe with the matches:然后使用匹配项创建一个新的 dataframe:
matched_df = df1[df1['matching_rows']=='Match']
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.