简体   繁体   中英

Finding correct person among multiple person names

I have a dataframe. In its one column there are single values and in its coresponding column there are subset of values.

df = pd.DataFrame()

Index   Values_1                            Values_2                                          
1  Muhammad bin Bashr bin al-Farafsa   Isma'il bin Abi Khalid al-
                                       Ahmsi [11418], Hisham bin 
                                       'Urwa [11065], Yahya bin 
                                       Sa'id bin Hiyan [11404]

1  Muhammad bin Bkar bin Bilal         Sa'id bin Basahyr al-Azdi 
                                       [20710], Sa'id bin 'Abdul
                                       'Aziz al-Tanuqi [20638]

1  Muhammad bin Bashar Bindar          Mua'dh bin Hisham bin Aby 
                                       [20287], Yahya bin Sa'id bin 
                                       Farroukh al-Qatan [20031]

2  Yahya bin Sa'id bin Farroukh al-Qatan  Y'aqub bin Ibrahim bin Kathir 
                                          [30400], Sh'uba[198]

2  Yahya bin Sa'd ibn Abi Waqqas          Sa'd ibn Abi Waqqas [9]

3  Hamza bin al-Mughira bin Shu'ba        al-Mughira ibn Shu'ba 
                                          [166] 

3  Shu'ba                                 Yahya bin Sa'id al khudri

   

I have to check whether Values_1 at index no 2 is present in any of the Values_2 present at index no 1.First groupby values by index For example, check whether Yahya bin Sa'id bin Farroukh al-Qatan is present in any of the Values_2 present at index 1

Output
Index   Values_1                        Values_2

 1      Muhammad bin Bashar Bindar      Mua'dh bin Hisham bin Aby 
                                        [20287], Yahya bin Sa'id 
                                        bin Farroukh al-Qatan 
                                        [20031]

2      Yahya bin Sa'id bin Farroukh al-Qatan Y'aqub bin Ibrahim bin Kathir
                                             [30400], Sh'uba[198]

3      Shu'ba                                Yahya bin Sa'id al_Khudri


                                             
                                             

                                   


 

    

Use:

#convert values to list and subtract index by 1 for match by next group
s = df.groupby(level=0)['Values_1'].agg(list)
s.index = s.index - 1
print (s)
Index
0    [Muhammad bin Bashr bin al-Farafsa, Muhammad b...
1    [Yahya bin Sa'id bin Farroukh al-Qatan, Yahya ...
2            [Hamza bin al-Mughira bin Shu'ba, Shu'ba]
Name: Values_1, dtype: object

#replace NaN to emty list
df['test'] = df.index.map(s).map(lambda x: [] if isinstance(x, float) else x)

#test if at least one value match from list from previous group
f = lambda x: any([y in x['Values_2'] for y in x['test']])
mask = df.apply(f, axis=1)

#filter by mask and remove helper column
df = df[mask].drop('test',axis=1)
print (df)
                         Values_1  \
Index                               
1      Muhammad bin Bashar Bindar   

                                                Values_2  
Index                                                     
1      Mua'dh bin Hisham bin Aby [20287], Yahya bin S...  

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM