简体   繁体   English

C#Regex在两对不同的字符串之间找到字符串

[英]C# Regex find string between two different pairs of strings

Using C# RegEx, I am trying to find text enclosed by two distinct pairs of words, say, start1....end1, and start2...end2. 使用C#RegEx,我试图找到由两个不同的单词对包围的文本,比如start1 .... end1和start2 ... end2。 In my example below I would like to get: text1, text2, text11, text22. 在我下面的例子中,我想得到:text1,text2,text11,text22。

string str = "This start1 text1 end1. And start2 text2 end2 is a test. This start1 text11 end1. And start2 text22 end2 is a test.";

Regex oRegEx = new Regex(@"start1(.*?)end1|start2(.*?)end2", RegexOptions.IgnoreCase);
MatchCollection oMatches = oRegEx.Matches(sHTML);
if (oMatches.Count > 0)
{
    foreach (Match mt in oMatches)
    {
        Console.WriteLine(mt.Value);     //the display includes the start1 and end1 (or start2 and end2)
        Console.WriteLine(mt.Groups[1].Value); //the display excludes the start1 and end1 (or start2 and end2) or displays an empty string depending on the order of pattern.
    }
}

mt.Groups[1].Value in the above code correctly displays text1, text11 if the pattern is @"start1(.*?)end1|start2(.*?)end2" but it displays empty strings for text2, and text22. mt.Groups[1].Value 。上面代码中的值正确显示text1,text11如果模式是@"start1(.*?)end1|start2(.*?)end2"但它显示text2和text22的空字符串。 On the other hand if I change order in the pattern to @"start2(.*?)end2|start1(.*?)end1" , it correctly displays text2, text22 but displays empty strings for text1 and text11. 另一方面,如果我将模式中的顺序更改为@"start2(.*?)end2|start1(.*?)end1" ,它会正确显示text2,text22但显示text1和text11的空字符串。 What needs to change in my code? 我的代码中需要更改什么? This MSDN article explains something about when a group returns empty string but I am still not getting the desired results. 这篇MSDN文章解释了一个组何时返回空字符串,但我仍然没有得到所需的结果。

Give name to group. 给组分名。

start1(?<val>.*?)end1|start2(?<val>.*?)end2

And get value as: 获得价值:

mt.Groups["val"].Value

The original problem is that without names the group between start1 and end1 has index 1 , and group between start2 and end2 has index 2 , as you can see from the following picture: 原来的问题是,没有名字的组start1end1具有指数1之间,和组start2end2具有指数2 ,你可以从下面的图片中看到: 正则表达式可视化

Or another solution is to use regex like: 或者另一种解决方案是使用正则表达式:

(?<=start([12])).*?(?=end\1)

正则表达式可视化

Debuggex Demo Debuggex演示

And then in your code: 然后在你的代码中:

Console.WriteLine(mt.Value);

will display the required content. 将显示所需的内容。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM