简体   繁体   中英

Matching two strings columns then assign label into new column

I have a dataframe that looks like this below.

Name            F_Name       L_Name                 Title     
John Down        John         Down            sth vs Down John
Dave Brown       Dave         Brown           sth v Brown Dave
Mary Sith        Mary         Sith            Sith Mary vs sth 
Sam Walker       Sam         Walker           sth vs Sam Walker 
Chris Humpy     Chris         Humpy                 Humpy
John Hunter      John        Hunter              John Hunter
Nola Smith       Nola         Smith                 Nola
Chuck Bass      Chuck         Bass               Bass v sth
Rob Bank         Rob          Bank                Rob v sth
Chris Ham       Chris         Ham                Chris Ham
Angie Poppy     Angie        Poppy               Poppy Angie
Joe Exhaust      Joe         Exhaust             sth vs Joe
 :                :           : 
Tony Start       Tony         Start              sth v Start

I would like to match the Name column with the Title column . If the Name appear before v or vs , then the new column Label will be first. Otherwise, it will be second. If the Title column only has the name without v or vs . It will be null.

Here is what the output dataframe would look like.

Name            F_Name       L_Name                 Title                  Label
John Down        John         Down            sth vs Down John             second
Dave Brown       Dave         Brown           sth v Brown Dave             second
Mary Sith        Mary         Sith            Sith Mary vs sth             first
Sam Walker       Sam         Walker           sth vs Sam Walker            second
Chris Humpy     Chris         Humpy                 Humpy                  null
John Hunter      John        Hunter              John Hunter               null
Nola Smith       Nola         Smith                 Nola                   null
Chuck Bass      Chuck         Bass               Bass v sth                first
Rob Bank         Rob          Bank                Rob vs sth               first
Chris Ham       Chris         Ham                Chris Ham                 null
Angie Poppy     Angie        Poppy               Poppy Angie               null
Joe Exhaust      Joe         Exhaust             sth vs Joe                second
 :                :            :                     :                       :
Tony Start       Tony         Start              sth v Start               second

I am thinking to split the v or vs from the Title column into two new columns then matching with the Name column . But I do not know how to add the conditions that to check whether the names appear before the v or vs . So I am wondering are there any better ways to do this without splitting the title column?

Thanks!!

Idea for matching is values before v or vs splitted by spaces and converted to sets and for second condition test this strings in Series.str.contains , last passed to numpy.select :

df['Label'] = df['Title'].str.split('\s+vs|v\s+').str[0].str.split().apply(set)

m1 = df.apply(lambda x: x['Label'].isdisjoint(set(x['Name'].split())), axis=1)
m2 = ~df['Title'].str.contains(r'\s+vs|v\s+')

df['Label'] = np.select([m1, m2], ['second', None], 'first')
print (df)
           Name F_Name   L_Name              Title   Label
0     John Down   John     Down   sth vs Down John  second
1    Dave Brown   Dave    Brown   sth v Brown Dave  second
2     Mary Sith   Mary     Sith   Sith Mary vs sth   first
3    Sam Walker    Sam   Walker  sth vs Sam Walker  second
4   Chris Humpy  Chris    Humpy              Humpy    None
5   John Hunter   John   Hunter        John Hunter    None
6    Nola Smith   Nola    Smith               Nola    None
7    Chuck Bass  Chuck     Bass         Bass v sth   first
8      Rob Bank    Rob     Bank          Rob v sth   first
9     Chris Ham  Chris      Ham          Chris Ham    None
10  Angie Poppy  Angie    Poppy        Poppy Angie    None
11  Joe Exhaust    Joe  Exhaust         sth vs Joe  second
12   Tony Start   Tony    Start        sth v Start  second

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM