[英]Find certain colons in string using Regex
I'm trying to search for colons in a given string so as to split the string at the colon for preprocessing based on the following conditions我正在尝试在给定字符串中搜索冒号,以便根据以下条件在冒号处拆分字符串以进行预处理
A Book: Chapter 1
or A Book:Chapter 1
前面或后面跟一个词,例如A Book: Chapter 1
或A Book:Chapter 1
:( or ): or:/ or:-)
etc如果它是表情符号的一部分,则不匹配,即:( or ): or:/ or:-)
等16:00
etc如果它是给定时间的一部分,即16:00
等,则不匹配I've come up with a regex as such我想出了一个正则表达式
(\:)(?=\w)|(?<=\w)(\:)
which satisfies conditions 2 & 3 but still fails on condition 3 as it matches the colon present in the string representation of time.它满足条件 2 和 3,但在条件 3 上仍然失败,因为它与时间字符串表示中的冒号相匹配。 How do I fix this?我该如何解决?
edit: it has to be in a single regex statement if possible编辑:如果可能的话,它必须在一个正则表达式语句中
You can use您可以使用
(:\b|\b:)(?!(?:(?<=\b\d:)|(?<=\b\d{2}:))\d{1,2}\b)
See the regex demo .请参阅正则表达式演示。 Details :详情:
(:\b|\b:)
- Group 1: a :
that is either preceded or followed with a word char (:\b|\b:)
- 第 1 组:a :
前面或后面有一个单词 char(??(:?(:<=\b\d?)|(:<=\b\d{2},))\d{1,2}\b)
- there should be no one or two digits right after :
(followed with a word boundary) if the :
is preceded with a single or two digits (preceded with a word boundary). (??(:?(:<=\b\d?)|(:<=\b\d{2},))\d{1,2}\b)
- 不应有一位或两位数字紧跟在:
之后(后跟单词边界)如果:
前面有一个或两个数字(前面有单词边界)。 Note :\b
is equal to :(?=\w)
and \b:
is equal to (?<=\w):
.注意:\b
等于:(?=\w)
并且\b:
等于(?<=\w):
。
If you need to get the same capturing groups as in your original pattern, replace (:\b|\b:)
with (?:(:)\b|\b(:))
.如果您需要获得与原始模式相同的捕获组,请将(:\b|\b:)
替换为(?:(:)\b|\b(:))
。
More flexible solution更灵活的解决方案
Note that excluding matches can be done with a simpler pattern that matches and captures what you need and just matches what you do not need.请注意,可以使用更简单的模式来排除匹配项,该模式匹配并捕获您需要的内容,只匹配您不需要的内容。 This is called "best regex trick ever" .这被称为“有史以来最好的正则表达式技巧” 。 So, you may use a regex like所以,你可以使用像这样的正则表达式
8:|:[PD]|\d+(?::\d+)+|(:\b|\b:)
that will match 8:
, :P
, :D
, one or more digits and then one or more sequences of :
and one or more digits, or will match and capture into Group 1 a :
char that is either preceded or followed with a word char.将匹配8:
、 :P
、 :D
、一个或多个数字,然后是一个或多个:
序列和一个或多个数字,或者将匹配并捕获到第 1 组 a :
前面或后面有单词的字符字符。 All you need to do is to check if Group 1 matched, and implement required extraction/replacement logic in the code.您需要做的就是检查第 1 组是否匹配,并在代码中实现所需的提取/替换逻辑。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.