在 Python 中找出字符串中正则表达式匹配的次数

Question

Is there a way that I can find out how many matches of a regex are in a string in Python?有没有办法可以找出 Python 字符串中正则表达式的匹配项数？ For example, if I have the string "It actually happened when it acted out of turn."例如，如果我有字符串"It actually happened when it acted out of turn."

I want to know how many times "ta" appears in the string.我想知道"ta"在字符串中出现了多少次。 In that string, "ta" appears twice.在该字符串中， "ta"出现了两次。 I want my function to tell me it appeared twice.我希望我的函数告诉我它出现了两次。 Is this possible?这可能吗？

Answer 1

import re
len(re.findall(pattern, string_to_search))

Answer 2

The existing solutions based on findall are fine for non-overlapping matches (and no doubt optimal except maybe for HUGE number of matches), although alternatives such as sum(1 for m in re.finditer(thepattern, thestring)) (to avoid ever materializing the list when all you care about is the count) are also quite possible.基于findall的现有解决方案适用于非重叠匹配（无疑是最佳的，除非可能匹配大量匹配），尽管诸如sum(1 for m in re.finditer(thepattern, thestring))替代方案（以避免永远当您只关心计数时实现列表）也很有可能。 Somewhat idiosyncratic would be using subn and ignoring the resulting string...:使用subn并忽略结果字符串会有点特殊……：

def countnonoverlappingrematches(pattern, thestring):
  return re.subn(pattern, '', thestring)[1]

the only real advantage of this latter idea would come if you only cared to count (say) up to 100 matches;如果您只关心计数（例如）最多 100 个匹配项，则后一种想法的唯一真正优势就会出现； then, re.subn(pattern, '', thestring, 100)[1] might be practical (returning 100 whether there are 100 matches, or 1000, or even larger numbers).然后， re.subn(pattern, '', thestring, 100)[1]可能是实用的（无论有 100 个匹配项，还是 1000 个，甚至更大的数字，都返回 100）。

Counting overlapping matches requires you to write more code, because the built-in functions in question are all focused on NON-overlapping matches.计算重叠匹配需要您编写更多代码，因为所讨论的内置函数都专注于非重叠匹配。 There's also a problem of definition, eg, with pattern being 'a+' and thestring being 'aa' , would you consider this to be just one match, or three (the first a , the second one, both of them), or...?还有一个定义问题，例如，模式为'a+'且字符串为'aa' ，您会认为这只是一个匹配项，还是三个匹配项（第一个a ，第二个，两者都匹配），或者。 ..？

Assuming for example that you want possibly-overlapping matches starting at distinct spots in the string (which then would give TWO matches for the example in the previous paragraph):例如，假设您希望从字符串中的不同位置开始可能重叠匹配（然后会为上一段中的示例提供两个匹配项）：

def countoverlappingdistinct(pattern, thestring):
  total = 0
  start = 0
  there = re.compile(pattern)
  while True:
    mo = there.search(thestring, start)
    if mo is None: return total
    total += 1
    start = 1 + mo.start()

Note that you do have to compile the pattern into a RE object in this case: function re.search does not accept a start argument (starting position for the search) the way method search does, so you'd have to be slicing thestring as you go -- definitely more effort than just having the next search start at the next possible distinct starting point, which is what I'm doing in this function.请注意，在这种情况下，您必须将模式编译为 RE 对象：函数re.search不接受start参数（搜索的起始位置），方法search的方式如此，因此您必须将字符串切片为你去 - 绝对比在下一个可能的不同起点开始下一次搜索更努力，这就是我在这个功能中所做的。

Answer 3

I know this is a question about regex.我知道这是一个关于正则表达式的问题。 I just thought I'd mention the count method for future reference if someone wants a non-regex solution.我只是想如果有人想要非正则表达式解决方案，我会提到计数方法以供将来参考。

>>> s = "It actually happened when it acted out of turn."
>>> s.count('t a')
2

Which return the number of non-overlapping occurrences of the substring返回子串不重叠出现的次数

Answer 4

You can find overlapping matches by using a noncapturing subpattern:您可以使用非捕获子模式查找重叠匹配项：

def count_overlapping(pattern, string):
    return len(re.findall("(?=%s)" % pattern, string))

Answer 5

你试过这个吗？

 len( pattern.findall(source) )

Answer 6

import re
print len(re.findall(r'ab',u'ababababa'))

Answer 7

To avoid creating a list of matches one may also use re.sub with a callable as replacement.为了避免创建匹配列表，还可以使用re.sub和可调用作为替换。 It will be called on each match, incrementing internal counter.它将在每次匹配时调用，增加内部计数器。

class Counter(object):
    def __init__(self):
        self.matched = 0
    def __call__(self, matchobj):
        self.matched += 1

counter = Counter()
re.sub(some_pattern, counter, text)

print counter.matched

Answer 8

this works fine这工作正常

ptr_str = lambda pattern,string1 :print(f'pattern = {pattern} times = {len(re.findall(pattern,string1))}')
pattern = 'AGATC'
str='AAGGTAAGTTTAGAATATAAAAGGTGAGTTAAATAGATCATAGGTTATATTGT'
ptr_str(pattern,string1)

在 Python 中找出字符串中正则表达式匹配的次数

问题描述

8 个解决方案

解决方案1
57 2009-09-03 16:22:47

解决方案2
30 已采纳 2009-09-03 17:43:59

解决方案3
18 2009-09-03 16:29:57

解决方案4
11 2009-09-03 18:32:48

解决方案5
10 2009-09-03 16:22:10

解决方案6
3

解决方案7
2 2011-04-12 12:10:33

解决方案8
0 2020-06-30 10:13:28

在 Python 中找出字符串中正则表达式匹配的次数

问题描述

8 个解决方案

解决方案1 57 2009-09-03 16:22:47

解决方案2 30 已采纳 2009-09-03 17:43:59

解决方案3 18 2009-09-03 16:29:57

解决方案4 11 2009-09-03 18:32:48

解决方案5 10 2009-09-03 16:22:10

解决方案6 3

解决方案7 2 2011-04-12 12:10:33

解决方案8 0 2020-06-30 10:13:28

解决方案1
57 2009-09-03 16:22:47

解决方案2
30 已采纳 2009-09-03 17:43:59

解决方案3
18 2009-09-03 16:29:57

解决方案4
11 2009-09-03 18:32:48

解决方案5
10 2009-09-03 16:22:10

解决方案6
3

解决方案7
2 2011-04-12 12:10:33

解决方案8
0 2020-06-30 10:13:28