How to compare two dataframes in pandas

Question

I have two dataframes:

The first one has n row of names.
The second one has n row of names.

for each name in the first dataframe:

see how many times it appears in the second dataframe.

The code looks something like this:

df5 = pd.read_excel(item1, usecols="B",skiprows=6)
df10 = pd.read_excel('SMR4xx_Change_situation.xlsm', sheet_name='LoPN',usecols='D', skiprows=4)

how do i count the number of times a name appears in the second database and output it besides the name in the first database?

Ex: The first name in the database is John. John appears in the second dataframe 4 times => output John 4

either print it in the console or write in a separate excel file the first database and on the second column the number of appearances.

Anything could help.

Answer 1

Well, you can create a datarame for the records you are seeking. You can first get list of unique names in the first dataframe like

uniqueNames = df5['B'].unique()  # Assuming column B contains the names

dfCount = pd.DataFrame(columns=['name', 'count'])

Now you can iterate through each of the unique names in the first dataframe and compare against the second dataframe like this:

for eachName in uniqueNames:
    dfCount = dfCount.append({'name':eachName, 
                              'count':(df10['D'] == eachName).sum()}, 
                              ignore_index=True)  # Assuming you need to compare with column D

Or If you want the counts to be present in the first database, something like this should work

import numpy as np
df10['counts'] = np.nan
df10['counts'] = np.select([dfCount['name']==df5['B']], [dfCount['count']], np.nan)

How to compare two dataframes in pandas

Question

1 answers

solution1
0 ACCPTED 2021-02-20 14:21:09

How to compare two dataframes in pandas

Question

1 answers

solution1 0 ACCPTED 2021-02-20 14:21:09

solution1
0 ACCPTED 2021-02-20 14:21:09