繁体   English   中英

正则表达式 - 计算一个单词在文本中出现的次数

[英]Regular expressions - counting how many times a word appears in a text

我试图设置的是一个函数,它给定某个文本将打印出单词['color', 'Colour', 'Color','Colour']出现的次数。 所以我得到以下结果:

assert colorcount("Color Purple") == 1

assert colorcount("Your colour is better than my colour") == 2

assert colorcount("color Color colour Colour") == 4

我拥有的是

import re

def colorcount(text):

all_matches = re.findall('color', 'Colour', 'Color'. 'Colour', text)

return len(all_matches)

print(colorcount(text)

它不起作用,那么我如何编写代码使其按照我的意愿工作?

如果你想使用正则表达式,你可以这样做:

import re

def colorcount(text):
  r = re.compile(r'\bcolour\b | \bcolor\b', flags = re.I | re.X)
  count = len(r.findall(text))
  print(count)
  return count

# These asserts work as expected without raising an AssertionError.
assert colorcount("Color Purple") == 1
assert colorcount("Your colour is better than my colour") == 2
assert colorcount("color Color colour Colour") == 4

哪些输出:

1
2
4

您可以简单地将文本转换为特定大小写(即全部较低),然后使用字符串的count()来循环每次出现的关键字:

def colorcount(text):
    KEY_WORDS = [ 'color', 'colour' ]
    counter = 0
    sanitexed_text = text.lower()
    for kw in KEY_WORDS:
        counter += sanitexed_text.count(kw.lower())
    return counter

text = 'color ds das Colour dsafasft e re Color'

# 3
print(colorcount(text))

# All following asserts pass
assert colorcount("Color Purple") == 1
assert colorcount("Your colour is better than my colour") == 2
assert colorcount("color Color colour Colour") == 4

尝试这个

def colorcount(text):
    return len(re.findall('[c|C]olou{0,1}r', text))

assert colorcount("Color Purple") == 1
assert colorcount("Your colour is better than my colour") == 2
assert colorcount("color Color colour Colour") == 4

使用以下带有标志re.I (不区分大小写)和re.findll正则表达式,然后返回返回列表的长度:

\bcolou?r\b
import re

def colorcount(text):
  return len(re.findall(r'\bcolou?r\b', text, flags=re.I))

print(colorcount('color Color colour Colour'))

印刷:

4

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM