I have TOTAL 2 dataset where dataset1 has PATIENTID column of around 40 entries and another dataset2 has Same PATIENTID of around 700 entries
I want to check if the PATIENTID of dataset1 are present in dataset2 or not.
I tried in Python Jupyter notebook, it is not working though through Python code.
PatientsNotTreated=unique(Datase1.PatientID)[!unique(Dataset1.PatientID) in unique(Dataset2.PatientID)]
PatientsNotTreated
I am getting error:
PatientsNotTreated=unique(Datase1.PatientID)[!unique(Dataset1.PatientID) in unique(Dataset2.PatientID)]
^
SyntaxError: invalid syntax
I expect output of patientID which are not present in daTASET2
Use Series.isin to make a boolena indexing with DataFrame.loc . Finally use Series.unique :
arr_out=Dataset1.loc[~Dataset1['PatientID'].isin(Dataset2['PatientID']),'PatientID'].unique()
arr_in=Dataset1.loc[Dataset1['PatientID'].isin(Dataset2['PatientID']),'PatientID'].unique()
to filter dataset1 according to the patient use:
Dataset1_filtered=Dataset1.loc[~Dataset1['PatientID'].isin(Dataset2['PatientID'])]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.