Python在字符串中找到所有模糊匹配序列

Question

I have a large string and I want to find all the input sequences that are matching in this string. 我有一个大字符串，我想找到与此字符串匹配的所有输入序列。

So for example, I want to find all the possible matches of defensive rebound in: 因此，例如，我想在以下位置找到防守篮板的所有可能匹配项：

Player xy had 10 defensive rebounds only in the 3rd quarter of a match that was a defensive battle between 2 teams that have a defensive rebound rate of over 80% and moreover the average number of rebounds in the defence by player was a staggering 3.5 球员xy仅在比赛的第3季度才获得10个防守篮板 ，这是两支球队之间的防守战， 防守篮板率均超过80％，而且该球员在防守中的平均篮板数达到惊人的3.5

I want to find all the bold words and after that extract them. 我想找到所有粗体字，然后将其提取出来。

I managed to build a script that does the extraction but it only works for exact matches. 我设法建立了执行提取的脚本，但它仅适用于完全匹配。

I was thinking of using difflib.SequenceMatcher but I got stuck. 我当时在考虑使用difflib.SequenceMatcher但是我陷入了困境。

Answer 1

You can use regex in python, and you should have a goog pattern to extract them. 您可以在python中使用regex，并且应该使用goog模式提取它们。

For example: 例如：

import re

#Find [defence(s)][space][rebound(s)][space][any word]
re.findall('defensive[\w]* rebound[\w]* [\w]+', s)

#Find [rebound(s)][space][any word][space][any word][space][any word]
re.findall('rebound[\w]* [\w]+ [\w]+ [\w]+', s)

findall return a list of matches findall返回匹配列表

If all your matches are in the same form of bold words you can extract them with: 如果您所有的匹配项都使用粗体字形式，则可以使用以下方式将其提取：

re.findall('rebound[ \w]*defence', s)
re.findall('defensive[\w]* rebound[\w]*[ rate]*', s)

Python在字符串中找到所有模糊匹配序列

问题描述

1 个解决方案

解决方案1
0 2015-11-09 08:52:37

Python在字符串中找到所有模糊匹配序列

问题描述

1 个解决方案

解决方案1 0 2015-11-09 08:52:37

解决方案1
0 2015-11-09 08:52:37