简体   繁体   中英

How to find repeating matching values in two Data Frame columns in Python?

I'm trying to write a script that will check if manager numbers match with employee number. It will continue down the column until all numbers are checked. When finished it will print a list of how many matched or didn't match.

[In:]
import pandas as pd

#reading in csv file to Data Frame
employeeData = pd.read_csv("C:/Users/Desktop/EmployeeList.csv")

#creatig a Data Frame
dataF = pd.DataFrame(employeeData);

#empty list where instances of T/F will be stored
booleans = [];

#256 manager numbers + 1896 empty rows
managers = pd.Series(employeeData['Manager ID Number']

 #Edit Forgot to include this line
condition = managers.equals(merge['Employee ID'])

 #check each row of employee data. 2153 rows of Employee Numbers
 for index, row in employeeData.iterrows():
    #Check every single Manager number for a match
    for index, row in managers.iteritems():
         if condition:
         booleans.append(True)
         print("Something matched!")

         else:
         print("Didn't match!"
         booleans.append(False)
#A length of all booleans is printed. 
print(len(booleans))


[Out:] Actual 
"Didn't match!" x 2153 times. (number of employees in list)

[Out:] Desired: 
"Something matched!"
"Didn't match!"
"Something matched!"
"Something matched!"
"Didn't match!"
"Something matched!".... to line 2153

My problem is it seems the index count won't move down. It will only output that it didn't match with the first number hundreds of times. I want to move the row position down so it all the employee numbers are checked against the Manager list. Some managers have more employee's that others so I have to check every single one!(256) I'm embarrassed to say I've been stuck on this problem for quite a while. New to python so any tips would be greatly appreciated

IIUC you need to use Pandas Merge()

df_emp_mng= pd.merge(df_Emp,df_Mang,left_on='EMP ID',right_on='Manager ID')
print (df_emp_mng)

print 'Number of managers in Employee' ,len(df_emp_mng)
print 'Number of managers not in Employee' ,len(df_Emp)-len(df_emp_mng)

input - Emplyee Data

   EMP ID name  MID
0     123   E3    1
1     124   E1    1
2     125   E2    2
3       4   X4    5

Input - Manager Data

   Manager ID Manager name Dep
0           1           X1   C
1           2           X2   D
2           3           X3   E
3           4           X4   F
4           5           X5   F

Output

   EMP ID name  MID  Manager ID Manager name Dep
0       4   X4    5           4           X4   F

Number of managers in Employee 1

Number of managers not in Employee 3

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM