简体   繁体   English

在C#中使用RegEx提取字符串的逗号分隔部分

[英]Extract comma separated portion of string with a RegEx in C#

Sample data: !!Part|123456,ABCDEF,ABC132!! 样本数据:!! Part | 123456,ABCDEF,ABC132 !!

The comma delimited list can be any number of any combination of alphas and numbers 逗号分隔列表可以是任意数量的字母和数字的任意组合

I want a regex to match the entries in the comma separated list: 我想要一个正则表达式来匹配逗号分隔列表中的条目:

What I have is: !!PART\\|(\\w+)(?:,{1}(\\w+))*!! 我所拥有的是:!! PART \\ |(\\ w +)(?:,{1}(\\ w +))* !!

Which seems to do the job, the thing is I want to retrieve them in order into an ArrayList or similar so in the sample data I would want: 似乎可以完成工作的是,我想按顺序将它们检索到ArrayList或类似列表中,因此在示例数据中需要:

  • 1 - 132456 1-132456
  • 2 - ABCDEF 2-ABCDEF
  • 3 - ABC123 3-ABC123

The code I have is: 我的代码是:

string partRegularExpression = @"!!PART\|(\w+)(?:,{1}(\w+))*!!"
Match match = Regex.Match(tag, partRegularExpression);
ArrayList results = new ArrayList();

foreach (Group group in match.Groups)
{
    results.Add(group.Value);
}

But that's giving me unexpected results. 但这给了我意想不到的结果。 What am I missing? 我想念什么?

Thanks 谢谢

Edit: A solution would be to use a regex like !!PART\\|(\\w+(?:,??\\w+)*)!! 编辑:一种解决方案是使用正则表达式,如!! PART \\ |(\\ w +(?:,?? \\ w +)*)!! to capture the comma separated list and then split that as suggested by Marc Gravell 捕获逗号分隔的列表,然后按照Marc Gravell的建议进行拆分

I am still curious for a working regex for this however :o) 我仍然对此感到满意的正则表达式:o)

You can either use split: 您可以使用split:

string csv = tag.Substring(7, tag.Length - 9);
string[] values = csv.Split(new char[] { ',' });

Or a regex: 或正则表达式:

Regex csvRegex = new Regex(@"!!Part\|(?:(?<value>\w+),?)+!!");
List<string> valuesRegex = new List<string>();
foreach (Capture capture in csvRegex.Match(tag).Groups["value"].Captures)
{
    valuesRegex.Add(capture.Value);
}

Unless I'm mistaken, that still only counts as one group. 除非我弄错了,否则那仅算作一组。 I'm guessing you'll need to do a string.Split(',') to do what you want? 我猜你需要做一个string.Split(',')做你想做的吗? Indeed, it looks a lot simpler to not bother with regex at all here... Depending on the data, how about: 确实,在这里完全不用理会正则表达式看起来要简单得多...根据数据,如何:

        if (tag.StartsWith("!!Part|") && tag.EndsWith("!!"))
        {
            tag = tag.Substring(7, tag.Length - 9);
            string[] data = tag.Split(',');
        }

I think the RegEx you are looking for is this: 我认为您正在寻找的RegEx是这样的:

(?:^!!PART\|){0,1}(?<value>.*?)(?:,|!!$)

This can then be run like this 然后可以像这样运行

        string tag = "!!Part|123456,ABCDEF,ABC132!!";

        string partRegularExpression = @"(?:^!!PART\|){0,1}(?<value>.*?)(?:,|!!$)";
        ArrayList results = new ArrayList();

        Regex extractNumber = new Regex(partRegularExpression, RegexOptions.IgnoreCase);
        MatchCollection matches = extractNumber.Matches(tag);
        foreach (Match match in matches)
        {
            results.Add(match.Groups["value"].Value);
        }            

        foreach (string s in results)
        {
            Console.WriteLine(s);
        }

The following code 以下代码

string testString = "!!Part|123456,ABCDEF,ABC132!!";
foreach(string component in testString.Split("|!,".ToCharArray(),StringSplitOptions.RemoveEmptyEntries) )
{
    Console.WriteLine(component);
}

will give the following output 将给出以下输出

Part
123456
ABCDEF
ABC132

This has the advantage of making the comma separated part of the string match up with the index numbers you (possibly accidentally incorrectly) specified in the original question (1,2,3). 这样的好处是使字符串的逗号分隔部分与您在原始问题(1,2,3)中指定的索引号(您可能偶然不正确地)相匹配。

HTH HTH

-EDIT- forgot to mention, this may have drawbacks if the format of each string is not as expected above, but then again it would break just as easily without stupendously complex regex too. -EDIT-忘了提一下,如果每个字符串的格式都没有达到上面的预期,这可能会有缺点,但是如果没有非常复杂的正则表达式,它也会同样容易损坏。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM