简体   繁体   中英

Regex to match words between underscores after second occurence of underscore

so i would like to get words between underscores after second occurence of underscore

this is my string

ABC_BC_BE08_C1000004_0124

I've assembled this expresion

(?<=_)[^_]+

well it matches what i need but only skips the first word since there is no underscore before it. I would like it to skip ABC and BC and just get the last three strings, i've tried messing around but i am stuck and cant make it work. Thanks!

You can use a non-regex approach here with Split and Skip :

var text = "ABC_BC_BE08_C1000004_0124";
var result = text.Split('_').Skip(2);
foreach (var s in result)
    Console.WriteLine(s);

Output:

BE08
C1000004
0124

See the C# demo .

With regex, you can use

var result = Regex.Matches(text, @"(?<=^(?:[^_]*_){2,})[^_]+").Cast<Match>().Select(x => x.Value);

See the regex demo and the C# demo . The regex matches

  • (?<=^(?:[^_]*_){2,}) - a positive lookbehind that matches a location that matches the following patterns immediately to the left of the current location:
    • ^ - start of string
    • (?:[^_]*_){2,} - two or more ( {2,} ) sequences of any zero or more chars other than _ ( [^_]* ) and then a _ char
  • [^_]+ - one or more chars other than _

Usign .NET there is also a captures collection that you might use with a regex and a repeated catpure group.

^[^_]*_[^_]*(?:_([^_]+))+

The pattern matches:

  • ^ Start of string
  • [^_]*_[^_]* Match any char except an _ , match _ and again any char except _
  • (?: Non capture group
    • _([^_]+) Match _ and capture 1 or more times any char except _ in group 1
  • )+ Close the non capture group and repeat 1 or more times

.NET regex demo | C# demo

For example:

var pattern = @"^[^_]*_[^_]*(?:_([^_]+))+";
var str = "ABC_BC_BE08_C1000004_0124";
var strings = Regex.Match(str, pattern).Groups[1].Captures.Select(c => c.Value);

foreach (String s in strings)
{
    Console.WriteLine(s);
}

Output

BE08
C1000004
0124

在此处输入图像描述

If you want to match only word characters in between the underscores, another option for a pattern could be using a negated character class [^\W_] excluding the underscore from the word characters in between:

^[^\W_]*_[^\W_]*(?:_([^\W_]+))+

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM