[英]Python Regex match everything except words that start with a hashtag
I'm trying to match all the words that do not start with a hashtag using Python Regex.我正在尝试使用 Python Regex 匹配所有不以主题标签开头的单词。
Example sentence:例句:
This is #a test for #matching #hashtags
I would like the following to be matched: This is test for我希望匹配以下内容:这是测试
I was able to match all the words that start with a hashtag with this: #\\b\\w *我能够将所有以主题标签开头的单词与此匹配: #\\b\\w *
Then I realized I needed the opposite.然后我意识到我需要相反的东西。
I tried many variation similar to these without success:我尝试了许多与这些类似的变体,但没有成功:
Nothing works.没有任何作用。
If you want a Regex, you will need a Negative Lookbehind如果你想要一个正则表达式,你将需要一个负回顾
(?<!#)\\b\\w+
https://regex101.com/r/aMdc7R/1 https://regex101.com/r/aMdc7R/1
A non-regex solution should be fine:非正则表达式解决方案应该没问题:
>>> text = 'This is #a test for #matching #hashtags'
>>> [word for word in text.split(' ') if not word.startswith('#')]
['This', 'is', 'test', 'for']
For regex, you need to use something like negative lookbehind assertion, which will match only if the substring is not preceded by substring/character specified.对于正则表达式,您需要使用诸如否定后视断言之类的东西,它仅在子字符串前面没有指定子字符串/字符时才匹配。
To prevent firing the lookbehind on every position before a match, you can switch the word boundary and the lookbehind (as lookarounds can be expensive) and the lookbehind fires after asserting the word boundary.为了防止在匹配之前在每个位置上触发lookbehind,您可以切换单词边界和lookbehind(因为lookarounds可能很昂贵)并且在断言单词边界后触发lookbehind。
\b(?<!#)\w+
\\b
A word boundary \\b
一个词边界(?<!#)
Negative lookbehind, assert not # directly to the left of the current position (?<!#)
负向后视,断言不是 # 直接在当前位置的左边\\w+
Match 1+ word characters \\w+
匹配 1+ 个单词字符
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.