简体   繁体   English

Python Regex 匹配除以主题标签开头的单词之外的所有内容

[英]Python Regex match everything except words that start with a hashtag

I'm trying to match all the words that do not start with a hashtag using Python Regex.我正在尝试使用 Python Regex 匹配所有不以主题标签开头的单词。

Example sentence:例句:

    This is #a test for #matching #hashtags

I would like the following to be matched: This is test for我希望匹配以下内容:这是测试

I was able to match all the words that start with a hashtag with this: #\\b\\w *我能够将所有以主题标签开头的单词与此匹配: #\\b\\w *

Then I realized I needed the opposite.然后我意识到我需要相反的东西。

I tried many variation similar to these without success:我尝试了许多与这些类似的变体,但没有成功:

  • ^(?#\\b\\w*) ^(?#\\b\\w*)
  • ^(?!#)\\w+$ ^(?!#)\\w+$
  • ^(?!#).* ^(?!#).*
  • /([\\s\\S]*?)(#) /([\\s\\S]*?)(#)
  • ^(?:(?!#).)*$ ^(?:(?!#).)*$

Nothing works.没有任何作用。

If you want a Regex, you will need a Negative Lookbehind如果你想要一个正则表达式,你将需要一个负回顾

(?<!#)\\b\\w+

https://regex101.com/r/aMdc7R/1 https://regex101.com/r/aMdc7R/1

A non-regex solution should be fine:非正则表达式解决方案应该没问题:

>>> text = 'This is #a test for #matching #hashtags'
>>> [word for word in text.split(' ') if not word.startswith('#')]
['This', 'is', 'test', 'for']

For regex, you need to use something like negative lookbehind assertion, which will match only if the substring is not preceded by substring/character specified.对于正则表达式,您需要使用诸如否定后视断言之类的东西,它仅在子字符串前面没有指定子字符串/字符时才匹配。

To prevent firing the lookbehind on every position before a match, you can switch the word boundary and the lookbehind (as lookarounds can be expensive) and the lookbehind fires after asserting the word boundary.为了防止在匹配之前在每个位置上触发lookbehind,您可以切换单词边界和lookbehind(因为lookarounds可能很昂贵)并且在断言单词边界后触发lookbehind。

\b(?<!#)\w+
  • \\b A word boundary \\b一个词边界
  • (?<!#) Negative lookbehind, assert not # directly to the left of the current position (?<!#)负向后视,断言不是 # 直接在当前位置的左边
  • \\w+ Match 1+ word characters \\w+匹配 1+ 个单词字符

Regex demo正则表达式演示

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM