简体   繁体   English

python中不以连字符开头的单词的正则表达式

[英]regular expression for words that don't start with hyphen in python

I need to do regular expression for words in python.我需要为python中的单词做正则表达式。 I get a sentence and I need to check if there are words in it.我得到一个句子,我需要检查其中是否有单词。

The words 'Hello', 'It's' will be in the list.单词“你好”、“它”将出现在列表中。 The words '--Mom' or '-Mom' not be in the list.单词“--Mom”或“-Mom”不在列表中。 But 'Mom' will be in the list, because it separate the '-' from 'Mom' so 'Mom' consider 'Word' How can I get that word that start with '-' not be as a 'Word', like '--Mom' ?但是“妈妈”将在列表中,因为它将“-”与“妈妈”分开,因此“妈妈”考虑“单词”我怎样才能得到以“-”开头的单词而不是“单词”,例如' - 妈妈' ?

def getWord():
  return"((^[A-Z])?[a-z]+)((\-[a-z]*)*)(\')?[a-z]{0,2}"

text=r"""Hello Bob! It's Mary, your mother-in-law, the mistake is your parents'! --Mom""")
com = re.compile(rf"""((?P<WORD>{getWord()})), """,re.MULTILINE | re.IGNORECASE | re.VERBOSE | re.UNICODE)

lst=[(v, k) for match in com.finditer(text)
                for k, v in match.groupdict().items()
                if v is not None and k != 'SPACE']
print(lst)

You may be overcomplicating this, and a regex find all search on \\w+ already comes close to what you want here.您可能对此过于复杂,并且正则表达式查找\\w+上的所有搜索已经接近您在此处想要的内容。 To allow for possessives, just make 's an optional ending after each word.为了允许所有格,只是做's每一个字后的可选结局。 Also, to rule out words which are not preceded by whitespace (or are at the very start of the string) we can preface with the negative lookbehind (?<!\\S) .此外,为了排除前面没有空格(或在字符串的最开头)的单词,我们可以使用否定的lookbehind (?<!\\S)作为开头。

text = "Hello Bob! It's Mary, your mother-in-law, the mistake is your parents! --Mom"
words = re.findall(r"(?<!\S)\w+(?:'s)?", text)
print(words)

This prints:这打印:

['Hello', 'Bob', "It's", 'Mary', 'your', 'mother', 'in', 'law', 'the', 'mistake', 'is',
 'your', 'parents']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM