如何使用python正则表达式计算单词后跟特殊字符的单词的出现次数

Question

I want to count the number of occurrences of the word 'people' in a text using python. 我想使用python计算文本中“人”一词的出现次数。 For that I use Counter and Python's regular expression: 为此，我使用Counter和Python的正则表达式：

    for j in range(len(paragraphs)):
        text = paragraphs[j].text
        count[j] = Counter(re.findall(r'\bpeople\b' ,text))

Yet, here my code does not take into account of the occurrences of people. 但是，这里的代码没有考虑到人的出现。 people! 人！ people? 人？ How can I modify it to also count the cases when the word is followed by a specific character? 我如何修改它以计算单词后面跟有特定字符的情况？

Thank you for you help, 谢谢你的帮助，

Answer 1

You can use an optional character-group in your regex: 您可以在正则表达式中使用可选的字符组：

r'\bpeople[.,!?]?\b'

The ? ？ specifies it can occure 0 or 1 times - the [] specifies what characters are allowed. 指定它可以出现0或1次- []指定允许的字符。 There is no need to escape the . 没有必要逃脱. (or fe ()*+? ) inside [] although they have special meaning for regex. （或[] fe ()*+? ），尽管它们对正则表达式有特殊含义。 If you wanted to use a - inside [] you would need to escape it as it is used to denote ranges in sets [1-5] == 12345 . 如果要使用-内部[] ，则需要对其进行转义，因为它用于表示集合[1-5] == 12345 。

See: https://docs.python.org/3/library/re.html#regular-expression-syntax 请参阅： https ： //docs.python.org/3/library/re.html#regular-expression-syntax

[] Used to indicate a set of characters. []用于指示一组字符。 In a set: 在一组中：

Characters can be listed individually, eg [amk] will match 'a', 'm', or 'k'. 字符可以单独列出，例如[amk]将匹配“ a”，“ m”或“ k”。 Ranges of characters can be indicated by giving two characters and separating them by a '-', for example [az] will match any lowercase ASCII letter, [0-5][0-9] will match all the two-digits numbers from 00 to 59, and [0-9A-Fa-f] will match any hexadecimal digit. 可以通过给出两个字符并用'-'隔开来表示字符范围，例如[az]将匹配任何小写的ASCII字母，[0-5] [0-9]将匹配所有的两位数字00到59，并且[0-9A-Fa-f]将匹配任何十六进制数字。 [...] [...]

Answer 2

people[?.!]

This will allow you to only match with people? 这将使您只与人匹配吗？ people. 人。 and/or people! 和/或人！

So if you add a few more Counter(re.finall( you will be able to do something like this 因此，如果您再添加一些Counter(re.finall(

#This will only match people
count[j] = Counter(re.findall(r'people\s' ,text))

#This will only match people?
count[j] = Counter(re.findall(r'people\?' ,text))

#This will only match people.
count[j] = Counter(re.findall(r'people\.' ,text))

#This will only match people!
count[j] = Counter(re.findall(r'people\!' ,text))

You need to use the \\ to escape the special characters 您需要使用\\来转义特殊字符

Also this is a good resource when you are experimenting with python regular expressions: https://pythex.org/ The site also has a regular expression cheat sheet 当您尝试使用python正则表达式时，这也是一个很好的资源： https : //pythex.org/该站点也有一个正则表达式备忘单

Answer 3

You can use a modifier statement at the end of the 'people' part of your Regex pattern. 您可以在Regex模式的“ people”部分的末尾使用修饰符语句。 Try the following: 请尝试以下操作：

for j in range(len(paragraphs)):
    text = paragraphs[j].text
    count[j] = Counter(re.findall('r\bpeople[.?!]?\b', text)

The ? ？ is for zero or more quantifier. 用于零个或多个量词。 The above pattern seems to work on regex101.com but I haven't tried in out in a Python shell yet. 上面的模式似乎可以在regex101.com上运行，但是我还没有在Python shell中尝试过。

Answer 4

Does it have to use regex? 是否必须使用正则表达式？ Why not just: 为什么不只是：

len(text.split("people"))-1

如何使用python正则表达式计算单词后跟特殊字符的单词的出现次数

问题描述

4 个解决方案

解决方案1
2 2018-10-23 18:15:38

解决方案2
1 2018-10-23 18:17:24

解决方案3
1 2018-10-23 18:18:22

解决方案4
0 2018-10-23 18:27:17

如何使用python正则表达式计算单词后跟特殊字符的单词的出现次数

问题描述

4 个解决方案

解决方案1 2 2018-10-23 18:15:38

解决方案2 1 2018-10-23 18:17:24

解决方案3 1 2018-10-23 18:18:22

解决方案4 0 2018-10-23 18:27:17

解决方案1
2 2018-10-23 18:15:38

解决方案2
1 2018-10-23 18:17:24

解决方案3
1 2018-10-23 18:18:22

解决方案4
0 2018-10-23 18:27:17