I need a panda query that checks if the first name middle name or last name is contained in full name and gives me a count.
I have this table:
FULL_NAME FIRST_NAME LAST_NAME
Joe Bloggs Joe Bloggs
Greg Greg
Larson Larson
Emily Sun
Clara Zen
Justin Tim Justin Tim
The expected output is: 4
Given that we understand now that only the full matches should be counted, we can write it as:
df['FULL_NAME'].eq((df['FIRST_NAME'] + ' ' + df['LAST_NAME']).str.strip()).sum()
Output:
4
Note that I've added .str.strip()
to my original answer to cover the cases when only the first or only the last name is specified in full name (in those cases we would get leading/trailing space from + ' ' +
that we need to remove)
I'm going to assume that the blank spaces are 'NaN' or can easily be made that or something like it. Empty strings will generate false positives with this code (as they are included in any string).
out=df.apply(lambda x: x['FIRST_NAME'] in x['FULL_NAME'] and x['LAST_NAME'] in x['FULL_NAME'],axis=1)
sum(out)
See this question for more information on columns being substrings of one another.
Perl's comment also looks like a nice answer, and may be faster (many things are faster than an apply). I should also note that my code may generate false positives depending on the structure of the data (ex: Last name of 'Ti' would match for 'Justin Tim'). The benefit of this code would be if you are concerned that some last/first names may have been switched. This would then detect a match, even if we're looking to match 'Tim Justin'.
Another tool that might be helpful is pandas string splitting capabilities . This would allow you to split the full name at some specified character, and perform operations based on components of a string. You can even expand the list into multiple new columns and do comparisons with these.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.