繁体   English   中英

如果匹配发生在子字符串中,如何在文本中找到带有异常的模式匹配?

[英]How to find pattern match in text with exceptions if match occurs within a substring?

我想确定在某些文本中是否存在任何字符串(来自可拒绝字符串列表),但仅当在找到它的文本中的较大允许字符串中找不到该字符串时(从允许列表中)字符串)。

简单的例子:

文字:“快速的红狐狸跳过农民面前的懒狗。”

rejectableStrings: "fox", "dog", "farmer"

allowableStrings: "quick red fox", "smurfy blue fox", "lazy brown dog", "old green farmer"

因此,如果在文本中找到每个字符串“fox”,“dog”或“farmer”中的任何一个,则提高标志,但如果找到的字符串包含在任何允许的字符串中(在同一位置内/周围)发现拒绝的文本)。

示例逻辑尚未完成:

string status = "allowable";
foreach (string rejectableString in rejectableStrings)
{
  // check if rejectableString is found as a whole word with either a space or start/end of string surrounding the flag
  // https://stackoverflow.com/a/16213482/56082
  string invalidValuePattern = string.Format(@"(?<!\S){0}(?!\S)", rejectableString);
  if (Regex.IsMatch(text, invalidValuePattern, RegexOptions.IgnoreCase))
  {
    // it is found so we initially raise the flag to check further
    status = "flagged";
    foreach (string allowableString in allowableStrings)
    {
      // only need to consider allowableString if it contains the rejectableString, otherwise ignore
      if (allowableString.Contains(rejectableString)) 
      {
        // check if the found occurence of the rejectableString in text is actually contained within a relevant allowableString, 

        // *** the area that needs attention *** 
        if ('rejectableString occurence found in text is also contained within the same substring allowableString of text')
        {
          // this occurrence of rejectableString is actually allowable, change status back to allowable and break out of the allowable foreach
          status = "allowable";
          break;
        } 
      }
    }
    if (status.Equals("flagged")) 
    {
      throw new Exception(rejectableString.ToUpper() + " found in text is not allowed.");
    }
  }
}

背景如果感兴趣:这是针对应用程序的SQL查询验证方法,其目标是拒绝包含永久数据库修改命令的查询,但如果找到的无效命令实际上是临时表的子字符串,则允许查询有效命令或一些其他逻辑异常,应允许查询中的命令。 这是一个多数据库查询验证,不是特定于单个数据库产品。

所以现实世界的例子是可抛弃和允许的

private string[] rejectableStrings = {"insert","update","set","alter",
   "create","delete"};
private string[] allowableStrings = { "insert into #", "create table #",
   "create global temporary table ", "create temporary tablespace ", "offset "};

并且文本将是一个SQL查询。

您可以先删除所有可接受的单词然后检查任何不允许的单词,然后执行此操作。 这可以确保当您查找不允许的单词时,您不会查看任何允许的单词。

public static void Main(string[] args)
{
   string[] rejectableStrings = new string[] {"fox", "dog", "farmer"};
   string[] allowableStrings = new string[] {"quick red fox", "smurfy blue fox", 
                                             "lazy brown dog", "old green farmer"};
   string teststr = "fox quick red fox";
   bool pass = true;
   foreach (string allowed in allowableStrings)
   {
      teststr = Regex.Replace(teststr, allowed, "", RegexOptions.IgnoreCase);
   }

   foreach (string reject in rejectableStrings)
   {
      if (Regex.Matches(teststr, reject, RegexOptions.IgnoreCase).Count > 0) {
         pass = false;
     }
   }
   Console.WriteLine(pass);
}

在线尝试

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM