[英]want to compare two data frames using a column value
I have two similar data frames.我有两个相似的数据框。 I want to compare the values using the column 1 values.我想使用第 1 列的值来比较这些值。
emp ID FirstName Lastname
1 Prasanna K
2 Siva B
emp ID FirstName Lastname
1 Prasana K
2 Siva B
3 Karunas Y
I want to compare two DF comparing the Emp ID and identify the unique, non-unique, and New items我想比较两个比较 Emp ID 的 DF 并识别唯一、非唯一和新项目
Thanks..谢谢..
-Prasanna.K -Prasanna.K
You can use something like the one given below,您可以使用下面给出的类似的东西,
>>> import pandas as pd
>>> import numpy as np
>>>
>>> dictA = {'emp ID': [0, 1],'FirstName': ['Prasanna', 'Siva'],'LastName': ['K','B']
... }
>>>
>>> dictB = {'emp ID': [0, 1, 2],'FirstName': ['Prasanna', 'Siva', 'Karunas'],'LastName': ['K','B','Y']
... }
>>>
>>>
>>>
>>>
>>> dfA = pd.DataFrame(dictA)
>>> dfB = pd.DataFrame(dictB)
>>>
>>>
>>>
>>> dfA
emp ID FirstName LastName
0 0 Prasanna K
1 1 Siva B
>>> dfB
emp ID FirstName LastName
0 0 Prasanna K
1 1 Siva B
2 2 Karunas Y
>>>
# For checking whether there are some unique values of dataframe B which are not present in dataframe A
>>> dfB['present'] = dfB['emp ID'].isin(dfA['emp ID'])
>>> dfB
emp ID FirstName LastName present
0 0 Prasanna K True
1 1 Siva B True
2 2 Karunas Y False
# For checking whether there are unique values of dataframe A which are not present in dataframe B
>>> dfA['present'] = dfA['emp ID'].isin(dfB['emp ID'])
>>> dfA
emp ID FirstName LastName present
0 0 Prasanna K True
1 1 Siva B True
>>> import pandas as pd
>>> import numpy as np
>>>
... dictA = {'emp ID': [0, 1,2,3],'FirstName': ['Prasanna', 'Siva','Bala','foo'],'LastName': ['K','B','Y','Y_F']
... }
>>>
... dictB = {'emp ID': [0, 1, 2],'FirstName': ['Prasanna', 'Siva', 'Karunas'],'LastName': ['K','B','Y'] }
>>>
...
...
...
... dfA = pd.DataFrame(dictA)
>>> dfB = pd.DataFrame(dictB)
>>>
...
>>>
>>>
... dfA
emp ID FirstName LastName
0 0 Prasanna K
1 1 Siva B
2 2 Bala Y
3 3 foo Y_F
>>>
>>> dfB
emp ID FirstName LastName
0 0 Prasanna K
1 1 Siva B
2 2 Karunas Y
>>>
>>>
...
>>> # For checking whether there are some unique values of dataframe B which are not same in dataframe A (for all columns together)
...
>>> dfB['same_all'] = dfB['emp ID'].isin(dfA['emp ID']) & dfB['FirstName'].isin(dfA['FirstName']) & dfB['LastName'].isin(dfA['LastName'])
>>>
...
>>> dfB
emp ID FirstName LastName same_all
0 0 Prasanna K True
1 1 Siva B True
2 2 Karunas Y False
>>>
>>> # Or for checking each column separately you can use something like for dataframe A
... dfB['same_emp_ID'] = dfB['emp ID'].isin(dfA['emp ID'])
>>>
>>> dfB['same_FirstName'] = dfB['FirstName'].isin(dfA['FirstName'])
>>>
>>> dfB['same_LastName'] = dfB['LastName'].isin(dfA['LastName'])
>>>
>>> # For checking whether there are unique values of dataframe A which are not same in dataframe B (for all columns together)
...
>>> dfA['same_all'] = dfA['emp ID'].isin(dfB['emp ID']) & dfA['FirstName'].isin(dfB['FirstName']) & dfA['LastName'].isin(dfB['LastName'])
>>>
>>>
>>> dfA
emp ID FirstName LastName same_all
0 0 Prasanna K True
1 1 Siva B True
2 2 Bala Y False
3 3 foo Y_F False
>>>
>>>
>>> # Or for checking each column separately you can use something like for dataframe A
... dfA['same_emp_ID'] = dfA['emp ID'].isin(dfB['emp ID'])
>>>
>>> dfA['same_FirstName'] = dfA['FirstName'].isin(dfB['FirstName'])
>>>
>>> dfA['same_LastName'] = dfA['LastName'].isin(dfB['LastName'])
>>>
>>>
>>> dfA
emp ID FirstName LastName same_all same_emp_ID same_FirstName same_LastName
0 0 Prasanna K True True True True
1 1 Siva B True True True True
2 2 Bala Y False True False True
3 3 foo Y_F False False False False
>>>
>>> dfB
emp ID FirstName LastName same_all same_emp_ID same_FirstName same_LastName
0 0 Prasanna K True True True True
1 1 Siva B True True True True
2 2 Karunas Y False True False True
>>>
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.