I have two similar data frames. I want to compare the values using the column 1 values.
emp ID FirstName Lastname
1 Prasanna K
2 Siva B
emp ID FirstName Lastname
1 Prasana K
2 Siva B
3 Karunas Y
I want to compare two DF comparing the Emp ID and identify the unique, non-unique, and New items
Thanks..
-Prasanna.K
You can use something like the one given below,
>>> import pandas as pd
>>> import numpy as np
>>>
>>> dictA = {'emp ID': [0, 1],'FirstName': ['Prasanna', 'Siva'],'LastName': ['K','B']
... }
>>>
>>> dictB = {'emp ID': [0, 1, 2],'FirstName': ['Prasanna', 'Siva', 'Karunas'],'LastName': ['K','B','Y']
... }
>>>
>>>
>>>
>>>
>>> dfA = pd.DataFrame(dictA)
>>> dfB = pd.DataFrame(dictB)
>>>
>>>
>>>
>>> dfA
emp ID FirstName LastName
0 0 Prasanna K
1 1 Siva B
>>> dfB
emp ID FirstName LastName
0 0 Prasanna K
1 1 Siva B
2 2 Karunas Y
>>>
# For checking whether there are some unique values of dataframe B which are not present in dataframe A
>>> dfB['present'] = dfB['emp ID'].isin(dfA['emp ID'])
>>> dfB
emp ID FirstName LastName present
0 0 Prasanna K True
1 1 Siva B True
2 2 Karunas Y False
# For checking whether there are unique values of dataframe A which are not present in dataframe B
>>> dfA['present'] = dfA['emp ID'].isin(dfB['emp ID'])
>>> dfA
emp ID FirstName LastName present
0 0 Prasanna K True
1 1 Siva B True
>>> import pandas as pd
>>> import numpy as np
>>>
... dictA = {'emp ID': [0, 1,2,3],'FirstName': ['Prasanna', 'Siva','Bala','foo'],'LastName': ['K','B','Y','Y_F']
... }
>>>
... dictB = {'emp ID': [0, 1, 2],'FirstName': ['Prasanna', 'Siva', 'Karunas'],'LastName': ['K','B','Y'] }
>>>
...
...
...
... dfA = pd.DataFrame(dictA)
>>> dfB = pd.DataFrame(dictB)
>>>
...
>>>
>>>
... dfA
emp ID FirstName LastName
0 0 Prasanna K
1 1 Siva B
2 2 Bala Y
3 3 foo Y_F
>>>
>>> dfB
emp ID FirstName LastName
0 0 Prasanna K
1 1 Siva B
2 2 Karunas Y
>>>
>>>
...
>>> # For checking whether there are some unique values of dataframe B which are not same in dataframe A (for all columns together)
...
>>> dfB['same_all'] = dfB['emp ID'].isin(dfA['emp ID']) & dfB['FirstName'].isin(dfA['FirstName']) & dfB['LastName'].isin(dfA['LastName'])
>>>
...
>>> dfB
emp ID FirstName LastName same_all
0 0 Prasanna K True
1 1 Siva B True
2 2 Karunas Y False
>>>
>>> # Or for checking each column separately you can use something like for dataframe A
... dfB['same_emp_ID'] = dfB['emp ID'].isin(dfA['emp ID'])
>>>
>>> dfB['same_FirstName'] = dfB['FirstName'].isin(dfA['FirstName'])
>>>
>>> dfB['same_LastName'] = dfB['LastName'].isin(dfA['LastName'])
>>>
>>> # For checking whether there are unique values of dataframe A which are not same in dataframe B (for all columns together)
...
>>> dfA['same_all'] = dfA['emp ID'].isin(dfB['emp ID']) & dfA['FirstName'].isin(dfB['FirstName']) & dfA['LastName'].isin(dfB['LastName'])
>>>
>>>
>>> dfA
emp ID FirstName LastName same_all
0 0 Prasanna K True
1 1 Siva B True
2 2 Bala Y False
3 3 foo Y_F False
>>>
>>>
>>> # Or for checking each column separately you can use something like for dataframe A
... dfA['same_emp_ID'] = dfA['emp ID'].isin(dfB['emp ID'])
>>>
>>> dfA['same_FirstName'] = dfA['FirstName'].isin(dfB['FirstName'])
>>>
>>> dfA['same_LastName'] = dfA['LastName'].isin(dfB['LastName'])
>>>
>>>
>>> dfA
emp ID FirstName LastName same_all same_emp_ID same_FirstName same_LastName
0 0 Prasanna K True True True True
1 1 Siva B True True True True
2 2 Bala Y False True False True
3 3 foo Y_F False False False False
>>>
>>> dfB
emp ID FirstName LastName same_all same_emp_ID same_FirstName same_LastName
0 0 Prasanna K True True True True
1 1 Siva B True True True True
2 2 Karunas Y False True False True
>>>
Taken from here
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.