简体   繁体   English

从 DF1 中选择行,其中列值与 DF2 中的列中的值匹配

[英]Selecting rows from DF1 where column values match values from a column from DF2

This problem has been solved (I think).这个问题已经解决了(我认为)。 Excel was the problem and not python after all. Excel 是问题所在,毕竟不是 python。 The below code should work for my needs and doesn't seem to be dropping rows after all.下面的代码应该可以满足我的需要,并且毕竟似乎不会删除行。

Rows Highlighted in yellow are the rows I want to select in DF1.以黄色突出显示的行是我想要在 DF1 中 select 的行。 The selection should be made based on the values in column_2 of DF1 that match the values of column_1 of DF2应根据 DF1 的 column_2 中与 DF2 的 column_1 的值匹配的值进行选择

Here was my preferred solution using Pandas package in python after a lot of trail and error/searching:这是我在经过大量跟踪和错误/搜索后在 python 中使用 Pandas package 的首选解决方案:

NEW_MATCHED_DF1 = DF1.loc[DF1['column 2'].isin(DF2['column_1'])]

The problem I am seeing is that when I compare my results to what happens in excel when I do the same thing, I am getting almost double the results and I think that my python technique is dropping duplicates.我看到的问题是,当我将我的结果与 excel 中发生的情况进行比较时,当我做同样的事情时,我得到的结果几乎翻了一番,我认为我的 python 技术正在删除重复项。 Of course, it is possible that I am doing something wrong in excel, or excel is incorrect for some other reason, but it is something I have verified in the past and much more familiar with excel so I am suspecting that it is more likely that I am doing something wrong in python.当然,有可能我在 excel 中做错了,或者 excel 由于某些其他原因不正确,但这是我过去验证过的,并且对 ZBF57C906FA7D2BB66D67372E41585 更熟悉所以我怀疑它更可能是我在 python 做错了什么。 EXCEL IS THE PROBLEM AFTER ALL:! EXCEL 毕竟是问题:! :/ :/

Ultimately, I would like to use python to select any and all rows in DF1 where column_2 of DF1 matches column_1 of DF2.最终,我想使用 python 到 select DF1 中的任何和所有行,其中 DF1 的 column_2 与 DF2 的 column_1 匹配。 Excel is absurdly slow and I would like to move away from using excel for manipulating large dataframes. Excel 速度非常慢,我想放弃使用 excel 来操作大型数据帧。

I appreciate any help or directions to help.我感谢任何帮助或帮助指示。 I really haven't been able to figure out if my code is in fact dropping duplicates and/or if there is another solution that I can be confident that wont do this.我真的无法弄清楚我的代码是否实际上正在删除重复项和/或是否有另一种我可以确信不会这样做的解决方案。

Try this using np.where :尝试使用np.where

import numpy as np
list_df2 = df2['column1'].unique().tolist()
df1['matching_rows'] = np.where(df1['column2'].isin(list_df2),'Match','No Match')

And then create a new dataframe with the matches:然后使用匹配项创建一个新的 dataframe:

matched_df = df1[df1['matching_rows']=='Match']

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 pandas 如何从 df2 获取 df1 的值,而 df1 和 df2 的值在列上重叠 - pandas how to get values from df2 for df1 while df1 and df2 have values overlapped on column(s) 如何通过匹配 df1 中与 df2 索引和列名匹配的列值来用 df1 中的数据填充 df2 - How to fill df2 with data from df1 by matching column values from df1 which match df2 index and column names 使用 df2 中的值,其中行值与 df1 列名匹配 - Use values from the df2 where row value matches df1 column name 从 DF2 替换 DF1 中的值 - Replace values in DF1 from DF2 如果两个不同数据帧中两列的值匹配,则将df2中另一列的值复制到df1中的列 - If values from two columns in two different data frames match then copy values from another column in df2 to column in df1 在DF2列值与DF1索引匹配的pandas DataFrame1中设置新的列值 - Set new column values in pandas DataFrame1 where DF2 column values match DF1 index 如何向 dataframe (df1) 添加一个新列,这是另一个 dataframe (df2) 中 df1 的多个查找值的总和 - How can I add a new column to a dataframe (df1) that is the sum of multiple lookup values from df1 in another dataframe (df2) 熊猫从df中提取行,其中df ['col']值与df2 ['col']值匹配 - Pandas extract rows from df where df['col'] values match df2['col'] values 如何将两个不同的数据框 df1 df2 与特定列(列 w)进行比较,并从 df2 更新 df1 中匹配的行列 AD - how to compare two different data frames df1 df2 with specific column ( column w) and update the matched rows column AD in df1 from df2 合并 df1 中的值对应于 df2 中的值的行 - Combining rows where values in df1 correspond to values in df2
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM