简体   繁体   English

使用正则表达式在字符串中查找某些冒号

[英]Find certain colons in string using Regex

I'm trying to search for colons in a given string so as to split the string at the colon for preprocessing based on the following conditions我正在尝试在给定字符串中搜索冒号,以便根据以下条件在冒号处拆分字符串以进行预处理

  1. Preceeded or followed by a word eg A Book: Chapter 1 or A Book:Chapter 1前面或后面跟一个词,例如A Book: Chapter 1A Book:Chapter 1
  2. Do not match if it is part of emoticons ie :( or ): or:/ or:-) etc如果它是表情符号的一部分,则不匹配,即:( or ): or:/ or:-)
  3. Do not match if it is part of a given time ie 16:00 etc如果它是给定时间的一部分,即16:00等,则不匹配

I've come up with a regex as such我想出了一个正则表达式

(\:)(?=\w)|(?<=\w)(\:)

which satisfies conditions 2 & 3 but still fails on condition 3 as it matches the colon present in the string representation of time.它满足条件 2 和 3,但在条件 3 上仍然失败,因为它与时间字符串表示中的冒号相匹配。 How do I fix this?我该如何解决?

edit: it has to be in a single regex statement if possible编辑:如果可能的话,它必须在一个正则表达式语句中

Word characters \w include numbers [a-zA-Z0-9_] So just use [a-ZA-Z] instead单词字符\w包括数字[a-zA-Z0-9_]所以只用[a-ZA-Z]代替

(\:)(?=[a-zA-Z])|(?<=[a-zA-Z])(\:)

Test Here在这里测试

You can use您可以使用

(:\b|\b:)(?!(?:(?<=\b\d:)|(?<=\b\d{2}:))\d{1,2}\b)

See the regex demo .请参阅正则表达式演示 Details :详情

  • (:\b|\b:) - Group 1: a : that is either preceded or followed with a word char (:\b|\b:) - 第 1 组:a :前面或后面有一个单词 char
  • (??(:?(:<=\b\d?)|(:<=\b\d{2},))\d{1,2}\b) - there should be no one or two digits right after : (followed with a word boundary) if the : is preceded with a single or two digits (preceded with a word boundary). (??(:?(:<=\b\d?)|(:<=\b\d{2},))\d{1,2}\b) - 不应有一位或两位数字紧跟在:之后(后跟单词边界)如果:前面有一个或两个数字(前面有单词边界)。

Note :\b is equal to :(?=\w) and \b: is equal to (?<=\w): .注意:\b等于:(?=\w)并且\b:等于(?<=\w):

If you need to get the same capturing groups as in your original pattern, replace (:\b|\b:) with (?:(:)\b|\b(:)) .如果您需要获得与原始模式相同的捕获组,请将(:\b|\b:)替换为(?:(:)\b|\b(:))

More flexible solution更灵活的解决方案

Note that excluding matches can be done with a simpler pattern that matches and captures what you need and just matches what you do not need.请注意,可以使用更简单的模式来排除匹配项,该模式匹配并捕获您需要的内容,只匹配您不需要的内容。 This is called "best regex trick ever" .这被称为“有史以来最好的正则表达式技巧” So, you may use a regex like所以,你可以使用像这样的正则表达式

8:|:[PD]|\d+(?::\d+)+|(:\b|\b:)

that will match 8: , :P , :D , one or more digits and then one or more sequences of : and one or more digits, or will match and capture into Group 1 a : char that is either preceded or followed with a word char.将匹配8::P:D 、一个或多个数字,然后是一个或多个:序列和一个或多个数字,或者将匹配并捕获到第 1 组 a :前面或后面有单词的字符字符。 All you need to do is to check if Group 1 matched, and implement required extraction/replacement logic in the code.您需要做的就是检查第 1 组是否匹配,并在代码中实现所需的提取/替换逻辑。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM