简体   繁体   中英

How to use DataFrame.isin when the two dataframes have different number of entries (value matching but index not matching)?

I have two data frames (df1 and df2). They both have a column "Class ID", df1 has 66,000 entries while df2 has 60,000 entries. I want to check that all the Class ID values in df2 belong to df1. The Class ID values are not unique (there are some other columns as well).

I am using this code:

print(df1['Class ID'].isin(df2['Class ID']).value_counts())

This is giving the result:

True  59,800
False 200

However, I extracted all the Class IDs which have been demarcated as "False" and compared them with vimdiff in bash. All the Class IDs demarcated as "False" are present in the df2. I read in Pandas documentation, that isin requires both index and column label match. Since the number of entries are different in both the dataframes so the index is not matching which is why this result is being displayed. How to tacke this problem? Any other efficient way?

why dont you join them on "Class ID" column. it is a far better way to achieve what you are trying to acheive. check this out. https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.join.html

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM