How to use DataFrame.isin when the two dataframes have different number of entries (value matching but index not matching)?

Question

I have two data frames (df1 and df2). They both have a column "Class ID", df1 has 66,000 entries while df2 has 60,000 entries. I want to check that all the Class ID values in df2 belong to df1. The Class ID values are not unique (there are some other columns as well).

I am using this code:

print(df1['Class ID'].isin(df2['Class ID']).value_counts())

This is giving the result:

True  59,800
False 200

However, I extracted all the Class IDs which have been demarcated as "False" and compared them with vimdiff in bash. All the Class IDs demarcated as "False" are present in the df2. I read in Pandas documentation, that isin requires both index and column label match. Since the number of entries are different in both the dataframes so the index is not matching which is why this result is being displayed. How to tacke this problem? Any other efficient way?

Answer 1

why dont you join them on "Class ID" column. it is a far better way to achieve what you are trying to acheive. check this out. https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.join.html

How to use DataFrame.isin when the two dataframes have different number of entries (value matching but index not matching)?

Question

1 answers

solution1
0 2019-10-16 17:55:41

How to use DataFrame.isin when the two dataframes have different number of entries (value matching but index not matching)?

Question

1 answers

solution1 0 2019-10-16 17:55:41

solution1
0 2019-10-16 17:55:41