简体   繁体   中英

Exact match of lists intersection using regex.findall in Python

I would like to get the intersection of to lists of words using regex. It's C implementation making it runs faster is of huge importance in this particular case... Even though I have a code almost working, it would also match 'embeded-words', like "buyers" and "buy" for exemple.

Some code probably explains it better. This is what I have so far:

re.findall(r"(?=(" + '|'.join(['buy', 'sell', 'gilt']) + r"))", ' '.join(['aabuya', 'gilt', 'buyer']))
>> ['buy', 'gilt', 'buy']

While this is what I would like:

re.exactfindall(['buy', 'sell', 'gilt'], ['aabuya', 'gilt', 'buyer'])
>>['gilt']

Thanks.

To do this using regexps, the easiest way is probably to include word breaks ( \\b ) in the matching expression, (outside the catch) giving you:

re.findall(r"\b(?=(" + '|'.join(['buy', 'sell', 'gilt']) + r")\b)",
    ' '.join(['aabuya', 'gilt', 'buyer']))

which outputs ['gilt'] as requested.

listgiven=['aabuya', 'gilt', 'buyer']
listtomatch=['buy', 'sell', 'gilt']
exactmatch = [x for x in listgiven if x in listtomatch]
print(exactmatch)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM