简体   繁体   中英

Find certain colons in string using Regex

I'm trying to search for colons in a given string so as to split the string at the colon for preprocessing based on the following conditions

  1. Preceeded or followed by a word eg A Book: Chapter 1 or A Book:Chapter 1
  2. Do not match if it is part of emoticons ie :( or ): or:/ or:-) etc
  3. Do not match if it is part of a given time ie 16:00 etc

I've come up with a regex as such

(\:)(?=\w)|(?<=\w)(\:)

which satisfies conditions 2 & 3 but still fails on condition 3 as it matches the colon present in the string representation of time. How do I fix this?

edit: it has to be in a single regex statement if possible

Word characters \w include numbers [a-zA-Z0-9_] So just use [a-ZA-Z] instead

(\:)(?=[a-zA-Z])|(?<=[a-zA-Z])(\:)

Test Here

You can use

(:\b|\b:)(?!(?:(?<=\b\d:)|(?<=\b\d{2}:))\d{1,2}\b)

See the regex demo . Details :

  • (:\b|\b:) - Group 1: a : that is either preceded or followed with a word char
  • (??(:?(:<=\b\d?)|(:<=\b\d{2},))\d{1,2}\b) - there should be no one or two digits right after : (followed with a word boundary) if the : is preceded with a single or two digits (preceded with a word boundary).

Note :\b is equal to :(?=\w) and \b: is equal to (?<=\w): .

If you need to get the same capturing groups as in your original pattern, replace (:\b|\b:) with (?:(:)\b|\b(:)) .

More flexible solution

Note that excluding matches can be done with a simpler pattern that matches and captures what you need and just matches what you do not need. This is called "best regex trick ever" . So, you may use a regex like

8:|:[PD]|\d+(?::\d+)+|(:\b|\b:)

that will match 8: , :P , :D , one or more digits and then one or more sequences of : and one or more digits, or will match and capture into Group 1 a : char that is either preceded or followed with a word char. All you need to do is to check if Group 1 matched, and implement required extraction/replacement logic in the code.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM