I have a dataframe that looks like this below.
Name F_Name L_Name Title
John Down John Down sth vs Down John
Dave Brown Dave Brown sth v Brown Dave
Mary Sith Mary Sith Sith Mary vs sth
Sam Walker Sam Walker sth vs Sam Walker
Chris Humpy Chris Humpy Humpy
John Hunter John Hunter John Hunter
Nola Smith Nola Smith Nola
Chuck Bass Chuck Bass Bass v sth
Rob Bank Rob Bank Rob v sth
Chris Ham Chris Ham Chris Ham
Angie Poppy Angie Poppy Poppy Angie
Joe Exhaust Joe Exhaust sth vs Joe
: : :
Tony Start Tony Start sth v Start
I would like to match the Name column
with the Title column
. If the Name appear before v or vs
, then the new column Label
will be first. Otherwise, it will be second. If the Title column
only has the name without v or vs
. It will be null.
Here is what the output dataframe would look like.
Name F_Name L_Name Title Label
John Down John Down sth vs Down John second
Dave Brown Dave Brown sth v Brown Dave second
Mary Sith Mary Sith Sith Mary vs sth first
Sam Walker Sam Walker sth vs Sam Walker second
Chris Humpy Chris Humpy Humpy null
John Hunter John Hunter John Hunter null
Nola Smith Nola Smith Nola null
Chuck Bass Chuck Bass Bass v sth first
Rob Bank Rob Bank Rob vs sth first
Chris Ham Chris Ham Chris Ham null
Angie Poppy Angie Poppy Poppy Angie null
Joe Exhaust Joe Exhaust sth vs Joe second
: : : : :
Tony Start Tony Start sth v Start second
I am thinking to split the v or vs
from the Title column
into two new columns then matching with the Name column
. But I do not know how to add the conditions that to check whether the names appear before the v or vs
. So I am wondering are there any better ways to do this without splitting the title column?
Thanks!!
Idea for matching is values before v or vs
splitted by spaces and converted to sets and for second condition test this strings in Series.str.contains
, last passed to numpy.select
:
df['Label'] = df['Title'].str.split('\s+vs|v\s+').str[0].str.split().apply(set)
m1 = df.apply(lambda x: x['Label'].isdisjoint(set(x['Name'].split())), axis=1)
m2 = ~df['Title'].str.contains(r'\s+vs|v\s+')
df['Label'] = np.select([m1, m2], ['second', None], 'first')
print (df)
Name F_Name L_Name Title Label
0 John Down John Down sth vs Down John second
1 Dave Brown Dave Brown sth v Brown Dave second
2 Mary Sith Mary Sith Sith Mary vs sth first
3 Sam Walker Sam Walker sth vs Sam Walker second
4 Chris Humpy Chris Humpy Humpy None
5 John Hunter John Hunter John Hunter None
6 Nola Smith Nola Smith Nola None
7 Chuck Bass Chuck Bass Bass v sth first
8 Rob Bank Rob Bank Rob v sth first
9 Chris Ham Chris Ham Chris Ham None
10 Angie Poppy Angie Poppy Poppy Angie None
11 Joe Exhaust Joe Exhaust sth vs Joe second
12 Tony Start Tony Start sth v Start second
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.