简体   繁体   English

正则表达式匹配一个条件,除非它是一个标签

[英]Regex to match a condition UNLESS it is a hashtag

I am trying to write a regex statement to remove digits or words that contain digits in them only if they are not a hashtag.我正在尝试编写一个正则表达式语句来删除数字或包含数字的单词,前提是它们不是主题标签。 I am able to succesfully match words that have digits in them, but cannot seem to write a condition that ignores words that begin with a hashtag.我能够成功匹配其中包含数字的单词,但似乎无法编写一个忽略以主题标签开头的单词的条件。

Here is a test string that I have been using to try and find a solution:这是我一直用来尝试寻找解决方案的测试字符串:

happening bit mediacon #2022ppopcon wearing stell naman today #sb19official 123 because h3llo also12 or 23old发生 bit mediacon #2022ppopcon 今天穿着 stell naman #sb19official 123 因为 h3llo 也是 12 或 23old

I need a regex command that will capture the 123, h3llo, also12 and 23old but ignore the #2022ppopcon and #sb19official strings.我需要一个正则表达式命令来捕获 123、h3llo、also12 和 23old,但忽略 #2022ppopcon 和 #sb19official 字符串。

I have tried the following regex statements.我尝试了以下正则表达式语句。

(#\w+\d+\w*)|(\w+\d+\w*) this succesfully captures the hashtags in group 1 and the non-hashtags in group 2, but I cannot figure out how to make it select group 2 only. (#\w+\d+\w*)|(\w+\d+\w*)这成功地捕获了第 1 组中的主题标签和第 2 组中的非主题标签,但我无法弄清楚如何使其仅 select 第 2 组.

(?<!#)\w*\d+\w* this excludes the first character after the hashtag but still captures all the remaining characters in the hashtag string. (?<!#)\w*\d+\w*这会排除主题标签之后的第一个字符,但仍会捕获主题标签字符串中的所有剩余字符。 for example in the string #2022ppopcan, it ignores #2 and captures 022ppopcan.例如在字符串#2022ppopcan 中,它忽略#2 并捕获022ppopcan。

You might use你可能会使用

(?<!\S)[^\W\d]*\d\w*
  • (?<!\S) Assert a whitespace boundary to the left (?<!\S)向左断言空白边界
  • [^\W\d]* Match optional word chars except a digit [^\W\d]*匹配除数字之外的可选单词字符
  • \d Match at least a single digit \d至少匹配一个数字
  • \w* Match optional word chars \w*匹配可选的单词字符

See a regex demo .查看正则表达式演示

If you want to allow a partial match, you can use a negative lookbehind to not assert a # followed by a word boundary:如果您想允许部分匹配,您可以使用否定的lookbehind 来不断言#后跟单词边界:

(?<!#)\b[^\W\d]*\d\w*

See another regex demo .查看另一个正则表达式演示

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM