简体   繁体   English

如何在C#中使用Regex从一个字符串中提取多个子字符串

[英]How to use Regex in C# to extract multiple substrings from a string

I search from the web and I have a partial solution only, so I make this question.我从网上搜索,我只有部分解决方案,所以我提出了这个问题。

Input:输入:

[A] this is A, and , [B] this is B, and hello , [C] this is C - From Here

I want to have a list:我想要一个清单:

list[0] == "this is A, and"
list[1] == "this is B, and hello"
list[2] == "this is C"
list[3] == "From Here"

I find that I should have something like this:我发现我应该有这样的东西:

Regex pattern = new Regex(@"^\[A\] (.*) , \[B\] (.*) , \[C\] (.*) - (.*)$");
List<string> matches = pattern.Matches(input).OfType<Mathc>().Select(m => m.value).Distinct().ToList();

But it is not working.但它不起作用。 I would like to ask how to make it works.我想问一下如何使它起作用。 Thanks.谢谢。

The regex is correct, the only thing that you need to do is to iterate on the match groups.正则表达式是正确的,您唯一需要做的就是迭代匹配组。 In your case the first group will be the whole sentence, so, you can simply skip the first item.在您的情况下,第一组将是整个句子,因此,您可以简单地跳过第一项。
PS and of course don't forget to check if there is at least one match result presented. PS ,当然不要忘记检查是否至少显示了一个匹配结果。 Also if this function will be executed many times I recommend you to extract regex to the static member of your class (because of performance and memory usages).此外,如果此函数将被多次执行,我建议您将正则表达式提取到类的静态成员中(因为性能和内存使用情况)。

private static readonly Regex pattern = new Regex(@"^\[A\] (.*) , \[B\] (.*) , \[C\] (.*) - (.*)$");

The final version of the method (with a pattern as a static member) looks like this.该方法的最终版本(以模式作为静态成员)如下所示。

public static List<string> GetMatches(string input)
{
    var matchResult = pattern.Match(input);
    if (matchResult.Length > 0)
    {
        return matchResult.Groups.Values
            .Skip(1)
            .Select(x => x.Value)
            .ToList();
    }
    
    return new List<string>();
}

The problem is with a confusion between a match and a group.问题在于比赛和小组之间的混淆。 The regex matches only once, but it has several groups inside.正则表达式只匹配一次,但里面有几个组。 Access the first match with [0] , then use .OfType<Group>() :使用[0]访问第一个匹配项,然后使用.OfType<Group>()

List<string> matches = pattern.Matches(input)[0].Groups.OfType<Group>().Select(m => m.Value).Distinct().ToList()

This will give you 5 results:这会给你5个结果:

LinqPad 截图

You can get rid of the first one with .Skip(1) or matches.RemoveAt(0);你可以用.Skip(1).Skip(1) matches.RemoveAt(0);去掉第一个matches.RemoveAt(0); . .

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM