简体   繁体   English

正则表达式字符串仅包含允许的字符,并限制字符的出现

[英]Regex A string only contains allowed characters and limit the occurances of characters

This is my character occurrence limit. 这是我的角色出现限制。

Dictionary<string,int> chracterLimit=new  Dictionary<string,int>{{"c",1,"a",2}};

This is my input string... 这是我的输入字符串...

var mystring="caac";

Here I check if the occurrence of the character is valid by LINQ and if it is used more than the allowed limit. 在这里,我检查LINQ是否对该字符有效,并且使用的字符数超过允许的限制。

bool checkstringvalid=!mystring
  .ToCharArray()
  .Select(c => c.ToString())
  .GroupBy(g => g)
  .ToList()
  .ToDictionary(
     d => d.FirstOrDefault(), 
     d => d.Count())
  .Any(z => z.Value > chracterGroup[z.Key]);

the output of above condition is > it is an invalid string. 以上条件的输出为>这是无效的字符串。 Because the occurrence of c is 2 but allowed limit is 1 only. 因为c的出现为2,但允许的限制仅为1。

When I use this function it is taking more time for bulk data... And my question is how can I check this more easily? 当我使用此功能时,要花费大量时间处理批量数据...而我的问题是,如何才能更轻松地检查此数据?

Can u give me a solution to check it by regular expression? 您能给我一个解决方案,以通过正则表达式对其进行检查吗? My imagine like /a{0,2}/ /c{0,1}/ 我的想象像/ a {0,2} / / c {0,1} /

Thanks in advance!:) 提前致谢!:)

The LINQ engine is quite smart, so you're unlikely to get much of a performance boost from what you currently have. LINQ引擎非常智能,因此您不太可能从当前的性能中获得很大的性能提升。 One thing you could do is cut out unnecessary operations. 您可以做的一件事就是减少不必要的操作。 A cleaner version of what you have would be: 您所拥有的东西的更干净的版本是:

int s;
bool violation = myString.GroupBy(c => c.ToString())
                         .Any(g => characterLimit.TryGetValue(g.Key, out s) && s < g.Count());

This eliminates the conversions from string, to character array, to list, to dictionary. 这消除了从字符串到字符数组,列表到字典的转换。

For anything quicker than this, you'd need to ditch LINQ and go with an iterative approach. 对于比这更快的任何事情,您都需要放弃LINQ并采用迭代方法。

When worknig with symbols, let's work with characters , not strings (we don't want excesive ToString() , don't we?): 当使用符号工作时,让我们使用字符而不是字符串 (我们不希望使用过多的ToString() ,不是吗?):

   Dictionary<char, int> chracterLimit = new  Dictionary<char,int>{
     {'c', 1},
     {'a', 2}
   };

Then let's detect counter examples early , ie if we have "aaaaaaaaa....aaa" we have to read just first 3 a , not the entire string: 然后让我们尽早发现计数器示例,即,如果我们有"aaaaaaaaa....aaa"我们只需要读取 3 a ,而不是整个字符串:

   Dictionary<char, int> actual = new Dictionary<char, int>();

   bool checkStringValid = true;

   foreach (char c in mystring) {
     int count = 0;

     if (actual.TryGetValue(c, out count))
       actual[c] = ++count;  
     else
       actual.Add(c, ++count);

     if (chracterLimit.TryGetValue(c, out var limit)) {
       if (count > limit) {
         checkStringValid = false; // limit exceeded

         break;   
       } 
     }
     else {
       checkStringValid = false;  // invalid charcater detected

       break;   
     } 
   }  

The code above is an optimization for speed ; 上面的代码是对速度的优化; if you are looking for more readable solution only: 如果您仅在寻找更具可读性的解决方案:

  bool checkstringvalid = !mystring
    .GroupBy(c => c)
    .Any(chunk => chracterLimit.TryGetValue(chunk.Key, out var limit)
       ? chunk.Skip(limit).Any()
       : true);

Your LINQ expression has a lot of conversion in it. 您的LINQ表达式具有很多转换。

How about this kind of thing instead? 怎么样呢?

 bool IsStringCompliant (string str, Dictionary<char><int> limits) 
 {
     var lim = new Dictionary<char><int>(limits);  // copy dict, allows re-use
     foreach (var c in str) {
       if (lim.ContainsKey(c)) {
           lim[c] -= 1;
           if (lim[c] <= 0) return false;
       }
       else return <<whatever result you want when a char is not in dict>>
    }
    return true;
 }

Then you do this to use that function. 然后执行此操作以使用该功能。

   var characterLimit = new  Dictionary<string,int>{{'c',1,'a',2}};
   var mystring="caac";
   bool checkstringvalid = IsStringCompliant(mystring, characterLimit);

This will be fast for a few reasons. 由于某些原因,这将很快。

  1. it uses char rather than string variables of length 1 where possible. 它尽可能使用char而不是长度为1的string变量。
  2. it plays to the C# compiler's loop optimization technology. 它发挥了C#编译器的循环优化技术的作用。
  3. it stops searching as soon as it knows a string has failed validity. 一旦知道字符串有效性失败,它将立即停止搜索。

Plus it's easier to understand for the next programmer. 另外,对于下一个程序员而言,它更容易理解。

I don't know why you are after a regex solution here. 我不知道您为什么要在这里使用正则表达式解决方案。 Definitively, I will not be faster. 绝对,我不会更快。 Arguably, it's even more complicated and involved if you go beyond your simple example. 可以说,如果超出简单示例的范围,它甚至会变得更加复杂和复杂。

For demonstration purposes only, here is your original condition converted to a regular expression: 仅出于演示目的,这是将您的原始条件转换为正则表达式:

  • up to one c is allowed 最多允许一个c
  • up to two a 's are allowed 最多允许两个a
^(?![^c\n]*c[^c\n]*c)(?![^a\n]*a[^a\n]*a[^a\n]*a).*$

Demo 演示

The idea here is to assert a pattern that violets the rules above: two c 's or three a 's using a negative lookahead with negated character classes as modified . 这里的想法是断言一个紫罗兰的规则,上面的规则:两个c或三个a使用否定的前瞻性以及修改后的否定字符类. . There are other ways to do it. 还有其他方法可以做到这一点。 You should be already convinced not use regex for this task. 您应该已经确信不要将正则表达式用于此任务。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM