I would like to get the intersection of to lists of words using regex. It's C implementation making it runs faster is of huge importance in this particular case... Even though I have a code almost working, it would also match 'embeded-words', like "buyers" and "buy" for exemple.
Some code probably explains it better. This is what I have so far:
re.findall(r"(?=(" + '|'.join(['buy', 'sell', 'gilt']) + r"))", ' '.join(['aabuya', 'gilt', 'buyer']))
>> ['buy', 'gilt', 'buy']
While this is what I would like:
re.exactfindall(['buy', 'sell', 'gilt'], ['aabuya', 'gilt', 'buyer'])
>>['gilt']
Thanks.
To do this using regexps, the easiest way is probably to include word breaks ( \\b
) in the matching expression, (outside the catch) giving you:
re.findall(r"\b(?=(" + '|'.join(['buy', 'sell', 'gilt']) + r")\b)",
' '.join(['aabuya', 'gilt', 'buyer']))
which outputs ['gilt']
as requested.
listgiven=['aabuya', 'gilt', 'buyer']
listtomatch=['buy', 'sell', 'gilt']
exactmatch = [x for x in listgiven if x in listtomatch]
print(exactmatch)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.