简体   繁体   中英

How to fill the NaN values with the values contains in another column?

I want to find if the split column contains anything from the class list. If yes, I want to update the category column using the values from the class list. The desired category is my optimal goal.

sampledata

domain      split   Category    Desired Category
abc@XYT.com XYT.com Null         XYT
abb@XTY.com XTY.com Null         Null
abc@ssa.com ssa.com Null         ssa
bbb@bbc.com bbc.com Null         bbc
ccc@abk.com abk.com Null         abk
acc@ssb.com ssb.com Null         ssb
            
Class=['NaN','XYT','ssa','abk','abc','def','asds','ssb','bbc','XY','ab']    



for index, row in df.iterrows():
    for x in class:
        intersection=row.split.contains(x)
        if intersection:
           df.loc[index,'class'] = intersection

Just cannot get it right

Please help, Thanks

Use str.extract . Create a regular expression that will match one of the words in the list and extract the word will match (or NaN if none).

Update : As the '|' operator is never greedy even if it would produce a longer overall match, you have to reverse sort your list manually.

lst = ['NaN','XY','ab','XYT','ssa','abk','abc','def','asds','ssb','bbc']
lst = sorted(lst, reverse=True)
pat = fr"({'|'.join(lst)})"

df['Category'] = df['split'].str.extract(pat)
>>> df
        domain    split Category
0  abc@XYT.com  XYT.com      XYT
1  abb@XTY.com  XTY.com      NaN
2  abc@ssa.com  ssa.com      ssa
3  bbb@bbc.com  bbc.com      bbc
4  ccc@abk.com  abk.com      abk
5  acc@ssb.com  ssb.com      ssb

>>> lst
['ssb', 'ssa', 'def', 'bbc', 'asds', 'abk', 'abc', 'ab', 'XYT', 'XY', 'NaN']

>>> pat
'(ssb|ssa|def|bbc|asds|abk|abc|ab|XYT|XY|NaN)'

Assuming there can only be a maximum of one match, try:

df["Category"] = df["split"].apply(lambda x: " ".join(c for c in Class if c in x))

>>> df
        domain    split Category
0  abc@XYT.com  XYT.com   XYT
1  abc@XTY.com  XTY.com      
2  abc@ssa.com  ssa.com   ssa
3  bbb@bbc.com  bbc.com   bbc
4  ccc@abk.com  abk.com   abk
5  acc@ssb.com  ssb.com   ssb

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM