简体   繁体   中英

Python find all fuzzy matching sequences in a string

I have a large string and I want to find all the input sequences that are matching in this string.

So for example, I want to find all the possible matches of defensive rebound in:

Player xy had 10 defensive rebounds only in the 3rd quarter of a match that was a defensive battle between 2 teams that have a defensive rebound rate of over 80% and moreover the average number of rebounds in the defence by player was a staggering 3.5

I want to find all the bold words and after that extract them.

I managed to build a script that does the extraction but it only works for exact matches.

I was thinking of using difflib.SequenceMatcher but I got stuck.

You can use regex in python, and you should have a goog pattern to extract them.

For example:

import re

#Find [defence(s)][space][rebound(s)][space][any word]
re.findall('defensive[\w]* rebound[\w]* [\w]+', s)

#Find [rebound(s)][space][any word][space][any word][space][any word]
re.findall('rebound[\w]* [\w]+ [\w]+ [\w]+', s)

findall return a list of matches

If all your matches are in the same form of bold words you can extract them with:

re.findall('rebound[ \w]*defence', s)
re.findall('defensive[\w]* rebound[\w]*[ rate]*', s)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM