简体   繁体   中英

Match exactly one occurrence with regex

Consider M,T,W,TH,F,S,SU are days of week.

I have regex which is working well except for one scenario when there is no sequence of weekdays, ie there is no M , T , W , TH , F , S , SU at the expected location inside the string.

For example, q10MT is valid but q10HT is invalid.

Below is my expression:

string expression = "q(\\d*)(M)?(T(?!H))?(W)?(TH)?(F)?(S(?!U))?(SU)?";

In case of q10MT , the output is q10MT which is correct, but in case of q10HT the output is q10 which is incorrect, my regex should return no value or empty string when there is no match.

What changes do I need to make in order to achieve this?

You can achieve it with a positive look-ahead:

q(\\d*)(?=(?:M|T(?!H)|W|TH|F|S(?!U)|SU))(M)?(T(?!H))?(W)?(TH)?(F)?(S(?!U))?(SU)?

Or, as @Taemyr noted, a shorter equivalent

q(\\d*)(?=(?:M|TH?|W|TH|F|SU?))(M)?(T(?!H))?(W)?(TH)?(F)?(S(?!U))?(SU)?

Here is a demo

The (?=(?:M|TH?|W|F|SU?)) look-ahead makes sure there is at least one required value from the alternation list you have after the look-ahead.

C# regex usage:

var rx = new Regex(@"q(\d*)(?=(?:M|TH?|W|TH|F|SU?))(M)?(T(?!H))?(W)?(TH)?(F)?(S(?!U))?(SU)?");
var result = rx.Match("q10MSUT").Value;

Result:

在此输入图像描述

What about the following:

q(\d*)(M|TH?|W|F|SU?)+

See demo with some examples on matches and no-matches. The key change in this regexp is that this one uses the + to require at least one match.

Be aware that this solution doesn't demand the days to be in order, and allows skipping of days specified in comments not to matter.

Edit: OP says in comments that he requires only one match for each day, which this solution doesn't account for.

If order does not matter you need to do something like this;

q(?<number>\d+)((?<monday>(?<!M\D*)M)|(?<tuesday>(?<!T(?!H)\D*)T(?!H))|(?<wednesday>(?<!W\D*)W)|(?<thursday>(?<!TH\D*)TH)|(?<friday>(?<!F\D*)F)|(?<saturday>(?<!S(?!U)\D*)S(?!U))|(?<sunday>(?<!SU\D*)SU))+

This matches if q is followed by some number, and then followed by one or more weekdays. Order of weekdays does not matter, and the negative lookbehind insures that no weekday can occur more than once.

Each weekday is captured in it's own capturing group and that group is named so that it can be extracted later. "q10MTsomething" will capture "q10MT" with 10 in the "number" capturing group, M in the "monday" capturing group and T in the "tuesday" capturing group, other capturing groups will be empty. "q10TFMother" will capture "q10TFM" with capturing as in the previous example, plus F in the "friday" capturing group. "q10TFMT" will capture "q10TFM" with capturing groups as in the previous example. "q10HT" will not match.

demo

Note that this is the regexp string. If entered in code you might need to escape the \\ s to produce the correct string.

The question is answered already. Even so I want to point to another idea using a variable length lookbehind for maintaining the sequence, which should be fine with .NET

q(\d*)[MTWFSUH]+(?<=q\d*(M)?(T)?(W)?(TH)?(F)?(S)?(SU)?)
  • [MTWFSUH] is the list of valid characters. At least one is required
  • Using a lookbehind for matching as long as the sequence is maintained

Test at your test tool

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM