使用正则表达式在字符串中查找某些冒号

Question

I'm trying to search for colons in a given string so as to split the string at the colon for preprocessing based on the following conditions我正在尝试在给定字符串中搜索冒号，以便根据以下条件在冒号处拆分字符串以进行预处理

Preceeded or followed by a word eg A Book: Chapter 1 or A Book:Chapter 1前面或后面跟一个词，例如A Book: Chapter 1或A Book:Chapter 1
Do not match if it is part of emoticons ie :( or ): or:/ or:-) etc如果它是表情符号的一部分，则不匹配，即:( or ): or:/ or:-)等
Do not match if it is part of a given time ie 16:00 etc如果它是给定时间的一部分，即16:00等，则不匹配

I've come up with a regex as such我想出了一个正则表达式

(\:)(?=\w)|(?<=\w)(\:)

which satisfies conditions 2 & 3 but still fails on condition 3 as it matches the colon present in the string representation of time.它满足条件 2 和 3，但在条件 3 上仍然失败，因为它与时间字符串表示中的冒号相匹配。 How do I fix this?我该如何解决？

edit: it has to be in a single regex statement if possible编辑：如果可能的话，它必须在一个正则表达式语句中

Answer 1

Word characters \w include numbers [a-zA-Z0-9_] So just use [a-ZA-Z] instead单词字符\w包括数字[a-zA-Z0-9_]所以只用[a-ZA-Z]代替

(\:)(?=[a-zA-Z])|(?<=[a-zA-Z])(\:)

Test Here在这里测试

Answer 2

You can use您可以使用

(:\b|\b:)(?!(?:(?<=\b\d:)|(?<=\b\d{2}:))\d{1,2}\b)

See the regex demo .请参阅正则表达式演示。 Details :详情：

(:\b|\b:) - Group 1: a : that is either preceded or followed with a word char (:\b|\b:) - 第 1 组：a :前面或后面有一个单词 char
(??(:?(:<=\b\d?)|(:<=\b\d{2},))\d{1,2}\b) - there should be no one or two digits right after : (followed with a word boundary) if the : is preceded with a single or two digits (preceded with a word boundary). (??(:?(:<=\b\d?)|(:<=\b\d{2},))\d{1,2}\b) - 不应有一位或两位数字紧跟在:之后（后跟单词边界）如果:前面有一个或两个数字（前面有单词边界）。

Note :\b is equal to :(?=\w) and \b: is equal to (?<=\w): .注意:\b等于:(?=\w)并且\b:等于(?<=\w): 。

If you need to get the same capturing groups as in your original pattern, replace (:\b|\b:) with (?:(:)\b|\b(:)) .如果您需要获得与原始模式相同的捕获组，请将(:\b|\b:)替换为(?:(:)\b|\b(:)) 。

More flexible solution更灵活的解决方案

Note that excluding matches can be done with a simpler pattern that matches and captures what you need and just matches what you do not need.请注意，可以使用更简单的模式来排除匹配项，该模式匹配并捕获您需要的内容，只匹配您不需要的内容。 This is called "best regex trick ever" .这被称为“有史以来最好的正则表达式技巧” 。 So, you may use a regex like所以，你可以使用像这样的正则表达式

8:|:[PD]|\d+(?::\d+)+|(:\b|\b:)

that will match 8: , :P , :D , one or more digits and then one or more sequences of : and one or more digits, or will match and capture into Group 1 a : char that is either preceded or followed with a word char.将匹配8: 、 :P 、 :D 、一个或多个数字，然后是一个或多个:序列和一个或多个数字，或者将匹配并捕获到第 1 组 a :前面或后面有单词的字符字符。 All you need to do is to check if Group 1 matched, and implement required extraction/replacement logic in the code.您需要做的就是检查第 1 组是否匹配，并在代码中实现所需的提取/替换逻辑。

使用正则表达式在字符串中查找某些冒号

问题描述

2 个解决方案

解决方案1
2 2022-02-07 18:34:06

解决方案2
2 已采纳 2022-02-07 18:41:16

使用正则表达式在字符串中查找某些冒号

问题描述

2 个解决方案

解决方案1 2 2022-02-07 18:34:06

解决方案2 2 已采纳 2022-02-07 18:41:16

解决方案1
2 2022-02-07 18:34:06

解决方案2
2 已采纳 2022-02-07 18:41:16