简体   繁体   中英

Regex A string only contains allowed characters and limit the occurances of characters

This is my character occurrence limit.

Dictionary<string,int> chracterLimit=new  Dictionary<string,int>{{"c",1,"a",2}};

This is my input string...

var mystring="caac";

Here I check if the occurrence of the character is valid by LINQ and if it is used more than the allowed limit.

bool checkstringvalid=!mystring
  .ToCharArray()
  .Select(c => c.ToString())
  .GroupBy(g => g)
  .ToList()
  .ToDictionary(
     d => d.FirstOrDefault(), 
     d => d.Count())
  .Any(z => z.Value > chracterGroup[z.Key]);

the output of above condition is > it is an invalid string. Because the occurrence of c is 2 but allowed limit is 1 only.

When I use this function it is taking more time for bulk data... And my question is how can I check this more easily?

Can u give me a solution to check it by regular expression? My imagine like /a{0,2}/ /c{0,1}/

Thanks in advance!:)

The LINQ engine is quite smart, so you're unlikely to get much of a performance boost from what you currently have. One thing you could do is cut out unnecessary operations. A cleaner version of what you have would be:

int s;
bool violation = myString.GroupBy(c => c.ToString())
                         .Any(g => characterLimit.TryGetValue(g.Key, out s) && s < g.Count());

This eliminates the conversions from string, to character array, to list, to dictionary.

For anything quicker than this, you'd need to ditch LINQ and go with an iterative approach.

When worknig with symbols, let's work with characters , not strings (we don't want excesive ToString() , don't we?):

   Dictionary<char, int> chracterLimit = new  Dictionary<char,int>{
     {'c', 1},
     {'a', 2}
   };

Then let's detect counter examples early , ie if we have "aaaaaaaaa....aaa" we have to read just first 3 a , not the entire string:

   Dictionary<char, int> actual = new Dictionary<char, int>();

   bool checkStringValid = true;

   foreach (char c in mystring) {
     int count = 0;

     if (actual.TryGetValue(c, out count))
       actual[c] = ++count;  
     else
       actual.Add(c, ++count);

     if (chracterLimit.TryGetValue(c, out var limit)) {
       if (count > limit) {
         checkStringValid = false; // limit exceeded

         break;   
       } 
     }
     else {
       checkStringValid = false;  // invalid charcater detected

       break;   
     } 
   }  

The code above is an optimization for speed ; if you are looking for more readable solution only:

  bool checkstringvalid = !mystring
    .GroupBy(c => c)
    .Any(chunk => chracterLimit.TryGetValue(chunk.Key, out var limit)
       ? chunk.Skip(limit).Any()
       : true);

Your LINQ expression has a lot of conversion in it.

How about this kind of thing instead?

 bool IsStringCompliant (string str, Dictionary<char><int> limits) 
 {
     var lim = new Dictionary<char><int>(limits);  // copy dict, allows re-use
     foreach (var c in str) {
       if (lim.ContainsKey(c)) {
           lim[c] -= 1;
           if (lim[c] <= 0) return false;
       }
       else return <<whatever result you want when a char is not in dict>>
    }
    return true;
 }

Then you do this to use that function.

   var characterLimit = new  Dictionary<string,int>{{'c',1,'a',2}};
   var mystring="caac";
   bool checkstringvalid = IsStringCompliant(mystring, characterLimit);

This will be fast for a few reasons.

  1. it uses char rather than string variables of length 1 where possible.
  2. it plays to the C# compiler's loop optimization technology.
  3. it stops searching as soon as it knows a string has failed validity.

Plus it's easier to understand for the next programmer.

I don't know why you are after a regex solution here. Definitively, I will not be faster. Arguably, it's even more complicated and involved if you go beyond your simple example.

For demonstration purposes only, here is your original condition converted to a regular expression:

  • up to one c is allowed
  • up to two a 's are allowed
^(?![^c\n]*c[^c\n]*c)(?![^a\n]*a[^a\n]*a[^a\n]*a).*$

Demo

The idea here is to assert a pattern that violets the rules above: two c 's or three a 's using a negative lookahead with negated character classes as modified . . There are other ways to do it. You should be already convinced not use regex for this task.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM