简体   繁体   中英

Use Pandas to check if strings in multiple columns appear in a different colum

I need a panda query that checks if the first name middle name or last name is contained in full name and gives me a count.

I have this table:

FULL_NAME    FIRST_NAME   LAST_NAME
Joe Bloggs   Joe          Bloggs
Greg                      Greg  
Larson       Larson       
Emily Sun       
Clara Zen       
Justin Tim   Justin        Tim

The expected output is: 4

Given that we understand now that only the full matches should be counted, we can write it as:

df['FULL_NAME'].eq((df['FIRST_NAME'] + ' ' + df['LAST_NAME']).str.strip()).sum()

Output:

4

Note that I've added .str.strip() to my original answer to cover the cases when only the first or only the last name is specified in full name (in those cases we would get leading/trailing space from + ' ' + that we need to remove)

I'm going to assume that the blank spaces are 'NaN' or can easily be made that or something like it. Empty strings will generate false positives with this code (as they are included in any string).

out=df.apply(lambda x: x['FIRST_NAME'] in x['FULL_NAME'] and x['LAST_NAME'] in x['FULL_NAME'],axis=1)
sum(out)

See this question for more information on columns being substrings of one another.

Perl's comment also looks like a nice answer, and may be faster (many things are faster than an apply). I should also note that my code may generate false positives depending on the structure of the data (ex: Last name of 'Ti' would match for 'Justin Tim'). The benefit of this code would be if you are concerned that some last/first names may have been switched. This would then detect a match, even if we're looking to match 'Tim Justin'.

Another tool that might be helpful is pandas string splitting capabilities . This would allow you to split the full name at some specified character, and perform operations based on components of a string. You can even expand the list into multiple new columns and do comparisons with these.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM