简体   繁体   中英

How to compare two dataframes in pandas

I have two dataframes:

  • The first one has n row of names.

  • The second one has n row of names.

for each name in the first dataframe:

  • see how many times it appears in the second dataframe.

The code looks something like this:

df5 = pd.read_excel(item1, usecols="B",skiprows=6)
df10 = pd.read_excel('SMR4xx_Change_situation.xlsm', sheet_name='LoPN',usecols='D', skiprows=4)

how do i count the number of times a name appears in the second database and output it besides the name in the first database?

Ex: The first name in the database is John. John appears in the second dataframe 4 times => output John 4

either print it in the console or write in a separate excel file the first database and on the second column the number of appearances.

Anything could help.

Well, you can create a datarame for the records you are seeking. You can first get list of unique names in the first dataframe like

uniqueNames = df5['B'].unique()  # Assuming column B contains the names

dfCount = pd.DataFrame(columns=['name', 'count'])

Now you can iterate through each of the unique names in the first dataframe and compare against the second dataframe like this:

for eachName in uniqueNames:
    dfCount = dfCount.append({'name':eachName, 
                              'count':(df10['D'] == eachName).sum()}, 
                              ignore_index=True)  # Assuming you need to compare with column D

Or If you want the counts to be present in the first database, something like this should work

import numpy as np
df10['counts'] = np.nan
df10['counts'] = np.select([dfCount['name']==df5['B']], [dfCount['count']], np.nan)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM