简体   繁体   中英

Python - count number of elements that are equal between two columns of two dataframes

I have two dataframes: df1 , df2 that contain two columns, col1 and col2 . I would like to calculate the number of elements in column col1 of df1 that are equal to col2 of df2 . How can I do that?

I assume you're using pandas.

One way is to simply use pd.merge and merge on the second column, and return the length of that column.

pd.merge(df1, df2, on="column_to_merge")

Pandas does an inner merge by default.

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html

You can use Series.isin df1.col1.isin(df2.col2).sum() :

df1 = pd.DataFrame({'col1': [1, 2, 3, 4, 5, 6]})
df2 = pd.DataFrame({'col2': [1, 3, 5, 7]})

nb_comon_elements = df1.col1.isin(df2.col2).sum()

assert nb_comon_elements == 3

Be cautious depending on your use case because:

df1 = pd.DataFrame({'col1': [1, 1, 1, 2, 7]})
df1.col1.isin(df2.col2).sum()

Would return 4 and not 2, because all 1 from df1.col1 are present in df2.col2 . If that's not the expected behaviour you could drop duplicates from df1.col1 before testing the intersection size:

df1.col1.drop_duplicates().isin(df2.col2).sum()

Which in this example would return 2.

To better understand why this is happening you can have look at what .isin is returning:

df1['isin df2.col2'] = df1.col1.isin(df2.col2)

Which gives:

   col1  isin df2.col2
0     1           True
1     1           True
2     1           True
3     2          False
4     7           True

Now .sum() adds up the booleans from column isin df2.col2 (a total of 4 True ).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM