简体   繁体   中英

Why Regex in a while loop will match only the first occurrence length (is not dynamic in a while loop)

I have a regex which I would imagine dynamically captures my group of zeros. What happens is I get a list full of eg [00, 00, 00, 00, 00] from a string like "001111110000001100110011111"

I've tried putting my var regex = new Regex() inside the while loop in hopes this might solve my problem. Whatever I try, regex returns only the first occurrences' length of zeros instead of filling my collection with varying zeros amounts.

List<string> ZerosMatch(string input)
{
    var newInput = input;
    var list = new List<string>();
    var regex = new Regex(@"[0]{1,}");
    var matches = regex.Match(newInput);

    while (matches.Success)
    {
        list.Add(matches.Value);

        try 
        {
            newInput = newInput.Remove(0, matches.Index);
        }
        catch
        {
            break;
        }                                      
    }
    return list;
}

vs

List<string> ZerosMatch(string input)
{
    var newInput = input;
    var list = new List<string>();
    bool hasMatch = true;

    while (hasMatch)
    {
        try 
        {
            var regex = new Regex(@"[0]{1,}");
            var matches = regex.Match(newInput);
            newInput = newInput.Remove(0, matches.Index);
            list.Add(matches.Value);
            hasMatch = matches.Success;
        }
        catch
        {
            break;
        }                                      
    }
    return list;
}

My question is Why is this happening?

        var newInput = input;   //The newInput variable is not needed and you can proceed with input
        var list = new List<string>();
        var regex = new Regex(@"[0]{1,}");
        var matches = regex.Matches(newInput);

        for(int i=0; i<matches.Count; i++)
        {
            list.Add(matches[i].Value);
        }
        return list;

I suggest using Matches instead of Match and query with a help of Linq (why should we loop, search again when we can get all the matches in one go):

using Sysem.Linq;

...

static List<string> ZeroesMatch(string input) => Regex
  .Matches(input ?? "", "0+")
  .Cast<Match>()
  .Select(match => match.Value)
  .ToList();

Here I've simplified pattern into 0+ (one or more 0 chars) and added ?? "" ?? "" to avoid exception on null string

In your first approach, you are only executing regex.Match once, so you are always looking at the very same match until your code throws an Exception. Depending on whether your first match is at index 0 or later, it's an OutOfBounds exception (because you try to remove from an empty string) or an OutOfMemory exception (because you are removing nothing from your string but adding to your result list indefinitively.

Your second approach will suffer from the same OutOfMemory exception if your input starts with a 0 or you arrive at some intermediate result string which starts with 0

See below for a working approach:

List<string> ZerosMatch(string input)
{
    var newInput = input;
    var list = new List<string>();
    var regex = new Regex(@"[0]{1,}");
    var match = regex.Match(newInput);
    while (match.Success)
    {
        newInput = newInput.Remove(match.Index, match.Value.Length);
        list.Add(match.Value);
        match = regex.Match(newInput);
    }
    return list;
}

Still, using Regex.Matches is the recommended approach, if you want to extract multiple instances of a match from a string...

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM