简体   繁体   中英

Selecting rows from DF1 where column values match values from a column from DF2

This problem has been solved (I think). Excel was the problem and not python after all. The below code should work for my needs and doesn't seem to be dropping rows after all.

Rows Highlighted in yellow are the rows I want to select in DF1. The selection should be made based on the values in column_2 of DF1 that match the values of column_1 of DF2

Here was my preferred solution using Pandas package in python after a lot of trail and error/searching:

NEW_MATCHED_DF1 = DF1.loc[DF1['column 2'].isin(DF2['column_1'])]

The problem I am seeing is that when I compare my results to what happens in excel when I do the same thing, I am getting almost double the results and I think that my python technique is dropping duplicates. Of course, it is possible that I am doing something wrong in excel, or excel is incorrect for some other reason, but it is something I have verified in the past and much more familiar with excel so I am suspecting that it is more likely that I am doing something wrong in python. EXCEL IS THE PROBLEM AFTER ALL:! :/

Ultimately, I would like to use python to select any and all rows in DF1 where column_2 of DF1 matches column_1 of DF2. Excel is absurdly slow and I would like to move away from using excel for manipulating large dataframes.

I appreciate any help or directions to help. I really haven't been able to figure out if my code is in fact dropping duplicates and/or if there is another solution that I can be confident that wont do this.

Try this using np.where :

import numpy as np
list_df2 = df2['column1'].unique().tolist()
df1['matching_rows'] = np.where(df1['column2'].isin(list_df2),'Match','No Match')

And then create a new dataframe with the matches:

matched_df = df1[df1['matching_rows']=='Match']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM