[英]How to find pattern match in text with exceptions if match occurs within a substring?
我想确定在某些文本中是否存在任何字符串(来自可拒绝字符串列表),但仅当在找到它的文本中的较大允许字符串中找不到该字符串时(从允许列表中)字符串)。
简单的例子:
文字:“快速的红狐狸跳过农民面前的懒狗。”
rejectableStrings: "fox", "dog", "farmer"
allowableStrings: "quick red fox", "smurfy blue fox", "lazy brown dog", "old green farmer"
因此,如果在文本中找到每个字符串“fox”,“dog”或“farmer”中的任何一个,则提高标志,但如果找到的字符串包含在任何允许的字符串中(在同一位置内/周围)发现拒绝的文本)。
示例逻辑尚未完成:
string status = "allowable";
foreach (string rejectableString in rejectableStrings)
{
// check if rejectableString is found as a whole word with either a space or start/end of string surrounding the flag
// https://stackoverflow.com/a/16213482/56082
string invalidValuePattern = string.Format(@"(?<!\S){0}(?!\S)", rejectableString);
if (Regex.IsMatch(text, invalidValuePattern, RegexOptions.IgnoreCase))
{
// it is found so we initially raise the flag to check further
status = "flagged";
foreach (string allowableString in allowableStrings)
{
// only need to consider allowableString if it contains the rejectableString, otherwise ignore
if (allowableString.Contains(rejectableString))
{
// check if the found occurence of the rejectableString in text is actually contained within a relevant allowableString,
// *** the area that needs attention ***
if ('rejectableString occurence found in text is also contained within the same substring allowableString of text')
{
// this occurrence of rejectableString is actually allowable, change status back to allowable and break out of the allowable foreach
status = "allowable";
break;
}
}
}
if (status.Equals("flagged"))
{
throw new Exception(rejectableString.ToUpper() + " found in text is not allowed.");
}
}
}
背景如果感兴趣:这是针对应用程序的SQL查询验证方法,其目标是拒绝包含永久数据库修改命令的查询,但如果找到的无效命令实际上是临时表的子字符串,则允许查询有效命令或一些其他逻辑异常,应允许查询中的命令。 这是一个多数据库查询验证,不是特定于单个数据库产品。
所以现实世界的例子是可抛弃和允许的
private string[] rejectableStrings = {"insert","update","set","alter",
"create","delete"};
private string[] allowableStrings = { "insert into #", "create table #",
"create global temporary table ", "create temporary tablespace ", "offset "};
并且文本将是一个SQL查询。
您可以先删除所有可接受的单词然后检查任何不允许的单词,然后执行此操作。 这可以确保当您查找不允许的单词时,您不会查看任何允许的单词。
public static void Main(string[] args)
{
string[] rejectableStrings = new string[] {"fox", "dog", "farmer"};
string[] allowableStrings = new string[] {"quick red fox", "smurfy blue fox",
"lazy brown dog", "old green farmer"};
string teststr = "fox quick red fox";
bool pass = true;
foreach (string allowed in allowableStrings)
{
teststr = Regex.Replace(teststr, allowed, "", RegexOptions.IgnoreCase);
}
foreach (string reject in rejectableStrings)
{
if (Regex.Matches(teststr, reject, RegexOptions.IgnoreCase).Count > 0) {
pass = false;
}
}
Console.WriteLine(pass);
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.