简体   繁体   中英

Regex to match a single character or an escaped character

I am trying to write my own Format code for time, this is a class project but the Format is an added for myself to work more with C# Regex. So what I am trying to do is match certain characters.

W w : w = weeks. W weeks preceded by a leading zero if smaller than 10
D d : d = days. D days preceded by a leading zero if smaller than 10
G g : g = Military Hours: G hours preceded by a leading zero if smaller than 10
H h : h = Civilian Hours: H hours preceded by a leading zero...
m : m = minutes
s : s = seconds

So what I have the regex so far is this

(w|W)(?=\b)|(d|D)(?=\b)|(h|H|g|G)(?=\b)|(m)(?=\b)|(s)(?=\b)

(w|W) //match upper or lower W
(?=\b) //positive lookahead only match if not apart of a word boundary

With the s it's match all s in the string so I'm lead to believe my regex is wrong of course. My problem is that I'm not sure how to do lookaheads and lookbehinds correctly. I basically only want the cases of characters I've supplied and only if they are by themselves OR escaped see examples below.

Format("w Weeks, D days, h:m:s");
//returns 7 Weeks, 04 days, 10:01:05
Format("[w] weeks [d] days H:m:s");
//returns [7] weeks [4] days 10:01:05
Format("w \Weeks D \days, h:m:s");
//returns 7 07eeks 04 4ays, 10:01:05

As you can see the last format with escaped w's and d's it still replaces them. Which is what I want. Again I'm not sure how to write the lookaheads and lookbehinds correctly .

I am using https://regex101.com/r/sL9cI2/1 regex101 here to test on. You can see it and what is going on. any suggestions please.

One thing about word boundaries is that they match an empty string . \\b matches a position, not a character, where it has a word character on one side, and it doesn't have a word character on the other. Eg, in "This is an example" , there are 8 positions matching \\b :

|This| |is| |an| |example|

| ::: denotes a word boundary

To match words, the regex should check it has a word boundary on each side: \\bword\\b (Notice there's no need for lookaheads here).

I basically only want the cases of characters I've supplied and only if they are by themselves OR escaped

Then you have 2 options to match:

  1. \\bw\\b The letter "w" as a word.
  2. \\\\w a backslash (you need to escape backslashes in regex) followed by the letter w.

Regex:

(\bw\b|\\w)

Moreover, looking at your attempts, I think you can use a character class to simplify the pattern.


Regex:

(\b[WwDdGgHhms]\b|\\[WwDdGgHhms])

regex101 Demo

  • Do note that this regex does not validate consecutive backslashes, which means we can't reliably specify a backslash in front of format code.

    Using \\\\week as an example, it is interpreted as \\ followed by week format code then literal string eek , instead of literal \\ followed by literal string week .

Use the following regex if you want to support such use case:

\G(?:[^\\]|\\.)*?(\b[WwDdGgHhms]\b|\\[WwDdGgHhms])

regex101 Demo

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM