简体   繁体   English

如何使用python正则表达式计算单词后跟特殊字符的单词的出现次数

[英]How to count occurences of a word following by a special character in a text using python regular expression

I want to count the number of occurrences of the word 'people' in a text using python. 我想使用python计算文本中“人”一词的出现次数。 For that I use Counter and Python's regular expression: 为此,我使用Counter和Python的正则表达式:

    for j in range(len(paragraphs)):
        text = paragraphs[j].text
        count[j] = Counter(re.findall(r'\bpeople\b' ,text))

Yet, here my code does not take into account of the occurrences of people. 但是,这里的代码没有考虑到人的出现。 people! 人! people? 人? How can I modify it to also count the cases when the word is followed by a specific character? 我如何修改它以计算单词后面跟有特定字符的情况?

Thank you for you help, 谢谢你的帮助,

You can use an optional character-group in your regex: 您可以在正则表达式中使用可选的字符组:

r'\bpeople[.,!?]?\b'

The ? specifies it can occure 0 or 1 times - the [] specifies what characters are allowed. 指定它可以出现0或1次- []指定允许的字符。 There is no need to escape the . 没有必要逃脱. (or fe ()*+? ) inside [] although they have special meaning for regex. (或[] fe ()*+? ),尽管它们对正则表达式有特殊含义。 If you wanted to use a - inside [] you would need to escape it as it is used to denote ranges in sets [1-5] == 12345 . 如果要使用-内部[] ,则需要对其进行转义,因为它用于表示集合[1-5] == 12345

See: https://docs.python.org/3/library/re.html#regular-expression-syntax 请参阅: https//docs.python.org/3/library/re.html#regular-expression-syntax

[] Used to indicate a set of characters. []用于指示一组字符。 In a set: 在一组中:

Characters can be listed individually, eg [amk] will match 'a', 'm', or 'k'. 字符可以单独列出,例如[amk]将匹配“ a”,“ m”或“ k”。 Ranges of characters can be indicated by giving two characters and separating them by a '-', for example [az] will match any lowercase ASCII letter, [0-5][0-9] will match all the two-digits numbers from 00 to 59, and [0-9A-Fa-f] will match any hexadecimal digit. 可以通过给出两个字符并用'-'隔开来表示字符范围,例如[az]将匹配任何小写的ASCII字母,[0-5] [0-9]将匹配所有的两位数字00到59,并且[0-9A-Fa-f]将匹配任何十六进制数字。 [...] [...]

people[?.!]

This will allow you to only match with people? 这将使您只与人匹配吗? people. 人。 and/or people! 和/或人!

So if you add a few more Counter(re.finall( you will be able to do something like this 因此,如果您再添加一些Counter(re.finall(

#This will only match people
count[j] = Counter(re.findall(r'people\s' ,text))

#This will only match people?
count[j] = Counter(re.findall(r'people\?' ,text))

#This will only match people.
count[j] = Counter(re.findall(r'people\.' ,text))

#This will only match people!
count[j] = Counter(re.findall(r'people\!' ,text))

You need to use the \\ to escape the special characters 您需要使用\\来转义特殊字符

Also this is a good resource when you are experimenting with python regular expressions: https://pythex.org/ The site also has a regular expression cheat sheet 当您尝试使用python正则表达式时,这也是一个很好的资源: https : //pythex.org/该站点也有一个正则表达式备忘单

You can use a modifier statement at the end of the 'people' part of your Regex pattern. 您可以在Regex模式的“ people”部分的末尾使用修饰符语句。 Try the following: 请尝试以下操作:

for j in range(len(paragraphs)):
    text = paragraphs[j].text
    count[j] = Counter(re.findall('r\bpeople[.?!]?\b', text)

The ? is for zero or more quantifier. 用于零个或多个量词。 The above pattern seems to work on regex101.com but I haven't tried in out in a Python shell yet. 上面的模式似乎可以在regex101.com上运行,但是我还没有在Python shell中尝试过。

Does it have to use regex? 是否必须使用正则表达式? Why not just: 为什么不只是:

len(text.split("people"))-1

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在python中使用正则表达式直接在特殊字符之后和数字之前找到单词? - How to find the word directly after a special character and before a number using regular expressions in python? 计算字段中单词/字符的出现次数 - Count occurences of word/character in a field 如何在Python中使用正则表达式删除具有特殊字符串的字符? - How to remove characters with special strings using regular expression in Python? 如何计算文本文件中某个元素中某个单词的出现次数? - How to count occurences of a word in a certain element in a text file? 正则表达式与Python中的特殊字符不匹配 - Regular expression won't match special character in Python Python正则表达式通过带有一个前导空格的特殊字符截断字符串 - Python regular expression truncate string by special character with one leading space 如何在python中使用正则表达式向下面的行添加单词? - How to add a word to the line below using regular expression in python? 如何使用正则表达式检查 python 字符串是否是有效的孟加拉语单词? - How to check if a python string is a valid Bengali word using regular expression? 如何使用正则表达式将字符串放在 python 中某个“单词”的前面? - How to put strings in front of a certain 'word' in python by using regular expression? 匹配后的Python正则表达式字符不相等 - Python Regular Expression Character following match does not equal
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM