简体   繁体   中英

Match list of strings in pandas column with RegEx

The problem: Find all names that specifically contains a substring from a list of strings and return the string.

I have a pandas series, such as:

231                richard occult (new earth)
6886                     bedivere (new earth)
705              arthur pendragon (new earth)
567     franklin delano roosevelt (new earth)
1468                     lancelot (new earth)
                        ...                  
6891                  nadine west (new earth)
6892               warren harding (new earth)
6893             william harrison (new earth)
6894             william mckinley (new earth)
6895                       mookie (new earth)
6896                     Superboy (new earth)

I have a list of specific sub-strings I wish to match to each name, which is:

boy_names = ['Mr.', 'Boy', 'Man', 'Lord', 'King', 
            'Brother', 'Sir', 'Prince', 'Father', 'Lad',
            'Baron', 'He-',' He' 'Son', 'Duke','Son','Dad', 'Senior',
            'Junior', 'Master']

Desired output: Superboy

I found an answer with returns the matches, but not the entire string.

def match(frame):
    result = []
    for item in frame.name:
        if re.search('|'.join(boys), item) is not None:
            results = re.search('|'.join(boys), item)
            result.append(results)

    return result

where boys is the list of names.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM