简体   繁体   English

正则表达式在第二次出现下划线后匹配下划线之间的单词

[英]Regex to match words between underscores after second occurence of underscore

so i would like to get words between underscores after second occurence of underscore所以我想在下划线第二次出现后在下划线之间添加单词

this is my string这是我的字符串

ABC_BC_BE08_C1000004_0124

I've assembled this expresion我已经组装了这个表达式

(?<=_)[^_]+

well it matches what i need but only skips the first word since there is no underscore before it.很好,它符合我的需要,但只跳过第一个词,因为它前面没有下划线。 I would like it to skip ABC and BC and just get the last three strings, i've tried messing around but i am stuck and cant make it work.我希望它跳过 ABC 和 BC,只获取最后三个字符串,我试过乱搞,但我被卡住了,无法让它工作。 Thanks!谢谢!

You can use a non-regex approach here with Split and Skip :您可以在此处使用SplitSkip的非正则表达式方法:

var text = "ABC_BC_BE08_C1000004_0124";
var result = text.Split('_').Skip(2);
foreach (var s in result)
    Console.WriteLine(s);

Output: Output:

BE08
C1000004
0124

See the C# demo .请参阅C# 演示

With regex, you can use使用正则表达式,您可以使用

var result = Regex.Matches(text, @"(?<=^(?:[^_]*_){2,})[^_]+").Cast<Match>().Select(x => x.Value);

See the regex demo and the C# demo .请参阅正则表达式演示C# 演示 The regex matches正则表达式匹配

  • (?<=^(?:[^_]*_){2,}) - a positive lookbehind that matches a location that matches the following patterns immediately to the left of the current location: (?<=^(?:[^_]*_){2,}) - 正向后视匹配与当前位置左侧紧邻的以下模式匹配的位置:
    • ^ - start of string ^ - 字符串的开始
    • (?:[^_]*_){2,} - two or more ( {2,} ) sequences of any zero or more chars other than _ ( [^_]* ) and then a _ char (?:[^_]*_){2,} - 两个或多个 ( {2,} ) 序列,除了_ ( [^_]* ) 之外的任何零个或多个字符,然后是一个_字符
  • [^_]+ - one or more chars other than _ [^_]+ - _以外的一个或多个字符

Usign .NET there is also a captures collection that you might use with a regex and a repeated catpure group. Usign .NET 还有一个捕获集合,您可以将其与正则表达式和重复的 catpure 组一起使用。

^[^_]*_[^_]*(?:_([^_]+))+

The pattern matches:模式匹配:

  • ^ Start of string ^字符串开始
  • [^_]*_[^_]* Match any char except an _ , match _ and again any char except _ [^_]*_[^_]*匹配除_之外的任何字符,匹配_并再次匹配除_之外的任何字符
  • (?: Non capture group (?:非捕获组
    • _([^_]+) Match _ and capture 1 or more times any char except _ in group 1 _([^_]+)匹配_并捕获第 1中除_以外的任何字符 1 次或多次
  • )+ Close the non capture group and repeat 1 or more times )+关闭非捕获组,重复1次或多次

.NET regex demo | .NET 正则表达式演示| C# demo C#演示

For example:例如:

var pattern = @"^[^_]*_[^_]*(?:_([^_]+))+";
var str = "ABC_BC_BE08_C1000004_0124";
var strings = Regex.Match(str, pattern).Groups[1].Captures.Select(c => c.Value);

foreach (String s in strings)
{
    Console.WriteLine(s);
}

Output Output

BE08
C1000004
0124

在此处输入图像描述

If you want to match only word characters in between the underscores, another option for a pattern could be using a negated character class [^\W_] excluding the underscore from the word characters in between:如果只想匹配下划线之间的单词字符,模式的另一个选项可能是使用否定字符 class [^\W_]从单词字符中排除下划线:

^[^\W_]*_[^\W_]*(?:_([^\W_]+))+

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM