如何将所有单词与正则表达式匹配，网址或类似字符除外？

Question

I'm trying to match all words in strings, except for strings with punctuation IN it like URLs. 我正在尝试匹配字符串中的所有单词，除了带有URL的标点符号的字符串。

I've tried many variations but when its working in the second string its wrong in first. 我尝试了许多变体，但是当它在第二个字符串中工作时，第一个出现错误。

s1 = "My dog is nice! My cat not. www.test.org ?"
s2 = "I am."
regex = r"\b\w+\W* \b"
m1 = re.findall(regex, s1)
m2 = re.findall(regex, s2)

Output for m1 is right: m1的输出是正确的：

['My ', 'dog ', 'is ', 'nice! ', 'My ', 'cat ', 'not. ']

Output for m2 is not what I want: m2的输出不是我想要的：

['I ']

... but I want ... 但我想要

['I ', 'am.']

Answer 1

You need an additional check...: 您需要额外的检查...：

regex = r"\b\w+\W* \b|\b\w+\W$"

...to match end cases where space does not follow dot. ...以匹配空间不跟随点结尾的情况。

Working code : 工作代码 ：

import re

s1 = "My dog is nice! My cat not. www.test.org ?"
s2 = "I am."

regex = r"\b\w+\W* \b|\b\w+\W$"

m1 = re.findall(regex, s1)
m2 = re.findall(regex, s2)

print(m1) # ['My ', 'dog ', 'is ', 'nice! ', 'My ', 'cat ', 'not. ']
print(m2) # ['I ', 'am.']

如何将所有单词与正则表达式匹配，网址或类似字符除外？

问题描述

1 个解决方案

解决方案1
0 已采纳 2019-01-19 06:14:33

如何将所有单词与正则表达式匹配，网址或类似字符除外？

问题描述

1 个解决方案

解决方案1 0 已采纳 2019-01-19 06:14:33

解决方案1
0 已采纳 2019-01-19 06:14:33