繁体   English   中英

在多个人名中找到正确的人

[英]Finding correct person among multiple person names

我有一个 dataframe。 在它的一列中有单个值,在其对应的列中有值的子集。

df = pd.DataFrame()

Index   Values_1                            Values_2                                          
1  Muhammad bin Bashr bin al-Farafsa   Isma'il bin Abi Khalid al-
                                       Ahmsi [11418], Hisham bin 
                                       'Urwa [11065], Yahya bin 
                                       Sa'id bin Hiyan [11404]

1  Muhammad bin Bkar bin Bilal         Sa'id bin Basahyr al-Azdi 
                                       [20710], Sa'id bin 'Abdul
                                       'Aziz al-Tanuqi [20638]

1  Muhammad bin Bashar Bindar          Mua'dh bin Hisham bin Aby 
                                       [20287], Yahya bin Sa'id bin 
                                       Farroukh al-Qatan [20031]

2  Yahya bin Sa'id bin Farroukh al-Qatan  Y'aqub bin Ibrahim bin Kathir 
                                          [30400], Sh'uba[198]

2  Yahya bin Sa'd ibn Abi Waqqas          Sa'd ibn Abi Waqqas [9]

3  Hamza bin al-Mughira bin Shu'ba        al-Mughira ibn Shu'ba 
                                          [166] 

3  Shu'ba                                 Yahya bin Sa'id al khudri

   

我必须检查索引号 2 处的 Values_1 是否存在于索引号 1 处的任何 Values_2 中。按索引排列的第一个 groupby 值 例如,检查 Yahya bin Sa'id bin Farroukh al-Qatan 是否存在于任何 Values_2 中出现在索引 1

Output
Index   Values_1                        Values_2

 1      Muhammad bin Bashar Bindar      Mua'dh bin Hisham bin Aby 
                                        [20287], Yahya bin Sa'id 
                                        bin Farroukh al-Qatan 
                                        [20031]

2      Yahya bin Sa'id bin Farroukh al-Qatan Y'aqub bin Ibrahim bin Kathir
                                             [30400], Sh'uba[198]

3      Shu'ba                                Yahya bin Sa'id al_Khudri


                                             
                                             

                                   


 

    

利用:

#convert values to list and subtract index by 1 for match by next group
s = df.groupby(level=0)['Values_1'].agg(list)
s.index = s.index - 1
print (s)
Index
0    [Muhammad bin Bashr bin al-Farafsa, Muhammad b...
1    [Yahya bin Sa'id bin Farroukh al-Qatan, Yahya ...
2            [Hamza bin al-Mughira bin Shu'ba, Shu'ba]
Name: Values_1, dtype: object

#replace NaN to emty list
df['test'] = df.index.map(s).map(lambda x: [] if isinstance(x, float) else x)

#test if at least one value match from list from previous group
f = lambda x: any([y in x['Values_2'] for y in x['test']])
mask = df.apply(f, axis=1)

#filter by mask and remove helper column
df = df[mask].drop('test',axis=1)
print (df)
                         Values_1  \
Index                               
1      Muhammad bin Bashar Bindar   

                                                Values_2  
Index                                                     
1      Mua'dh bin Hisham bin Aby [20287], Yahya bin S...  

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM