简体   繁体   中英

Match list of substrings and strings and return substring if it matches

I've seen may questions on this topic but most are the opposite of mine. I have a list of strings (column of a data frame) and a list of sub strings. I want to compare each string to the list of sub strings If it contains a sub string then return that sub-string else print 'no match'.

    subs = [cat, dog, mouse]

    df

      Name       Number     SubMatch
     dogfood      1           dog
     catfood      3           cat
     dogfood      2           dog
     mousehouse   1           mouse
     birdseed     1           no match

my current output looks like this though:

     Name       Number     SubMatch
     dogfood      1           dog
     catfood      3           dog
     dogfood      2           dog
     mousehouse   1           dog
     birdseed     1           dog

I suspect my code is just returning the first thing in the series, how do I change that to the correct thing in the series? Here is the Function:

    def matchy(col, subs):
        for name in col:
            for s in subs:
                if any(s in name for s in subs):
                    return s
                else:
                    return 'No Match'

The pandaic way to solve this would be to not use loops at all. You could do this pretty simply with str.extract :

p = '({})'.format('|'.join(subs))
df['SubMatch'] = df.Name.str.extract(p, expand=False).fillna('no match')

df

         Name  Number  SubMatch
0     dogfood       1       dog
1     catfood       3       cat
2     dogfood       2       dog
3  mousehouse       1     mouse
4    birdseed       1  no match

How about this:

def matchy(col, subs):
    for name in col:
        try:
            return next(x for x in subs if x in name)
        except StopIteration:
            return 'No Match'

The problem with your code was that you were checking for matches with any but returning the first item of the iteration first ( dog ).


EDIT kudos @Coldspeed

def matchy(col, subs):
    for name in col:
        return next(x for x in subs if x in name, 'No match')

I think you are over complicating things with a nested loop then the any test inside. Would this work better:

def matchy(col, subs):
        for name in col:
            for s in subs:
                if s in name:
                    return s
                else:
                    return 'No Match'

Unless there is code missing that accounts for it, it would appear that your code returns the result for the very first comparison, and actually does not look at any of the other items in the col list. If you would rather stick with nested loops, I would suggest modifying your code like so:

def matchy(col, subs):
    subMatch = []
    for name in col:
        subMatch.append('No Match')
        for s in subs:
            if s in name:
                subMatch[-1] = s
                break
    return subMatch

This assumes that col is a list of strings containing the column information (dogfood, mousehouse, etc) and that subs is a list of strings containing the substrings you wish to search for. subMatch is a list of strings returned by matchy that contains the search results for each item in col .

For each value in col we examine, we append the 'No Match' string to subMatch, basically assuming we did not find a match. Then we iterate through subs , checking to see if the substring s is contained within name . If there is a match, then subMatch[-1] = s replaces the most recent 'No Match' we appended with the matching substring, then we break to move onto the next item in col since we don't need to search for any more values. Note that subMatch[-1] = s can be replaced with other methods, such as doing subMatch.pop() followed by subMatch.append(s) , though at that point I think it is more personal preference. Once all elements in col have been checked, subMatch is returned, at which point you can then process it however you like.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM