[英]How to effectively match a string with lots of regular expressions
I want to be able to effectively match a string with a number of regular expressions to determine what this string represents. 我希望能够有效地将字符串与许多正则表达式匹配,以确定此字符串代表什么。
^[0-9]{1}$ if string matches it is of type 1
^[a-x]{300}$ if string matches it is of type 2
... ...
Iterating over a collection containing all of the regular expressions every time I want to match a string is way too heavy for me. 每次我想匹配一个字符串时,都要对包含所有正则表达式的集合进行迭代,这对我来说太麻烦了。
Is there any more effective way? 有没有更有效的方法? Maybe I can compile these regexps into one big one?
也许我可以将这些正则表达式编译成一个大的正则表达式? Maybe something that works like Google Suggestions, analysing letter after letter?
也许像Google Recommendations这样的东西可以分析一个又一个字母吗?
In my project, I am using PHP/MySQL, however I will be thankful for a clue in any language. 在我的项目中,我正在使用PHP / MySQL,但是对于任何语言的线索我都会感激不尽。
Edit: Operation of matching a string will be very frequent and string values will vary. 编辑:匹配字符串的操作将非常频繁,并且字符串值将有所不同。
What you could do, if possible, is grouping your regexes together and determine in which group a string belongs. 如果可能的话,您可以做的就是将正则表达式分组在一起,并确定字符串属于哪个组。
For instance, if a string doesn't match \\d
, you know there is no digit in it and you can skip all regexes that require one. 例如,如果字符串与
\\d
不匹配,则说明其中没有数字,您可以跳过所有需要一个的正则表达式。 So (for instance) instead of matching against +300 regexes, you can narrow that down to just 25. 因此(例如)您可以将其范围缩小到25个,而不是与+300个正则表达式匹配。
You can sum up your regexes like this: 您可以像这样总结您的正则表达式:
^([0-9])|([a-x]{300})$
Later, if you get more regex, you can do this: 以后,如果您获得更多的正则表达式,则可以执行以下操作:
^([0-9])|([a-x]{300})|([x-z]{1,5})|([ab]{2,})$...
Then use this code: 然后使用以下代码:
$input=...
preg_match_all('#^([0-9])|([a-x]{300})$#', $input, $matches);
foreach ($matches as $val) {
if (isset($val[1])) {
// type 1
} else if (isset($val[2])) {
// type 2
}
// and so on...
}
Since the regexes are going to be changing, I don't think you can get a generic answer - both your regex(es), and the way you handle them will need to evolve. 由于正则表达式将会发生变化,因此我认为您无法获得一个通用的答案-您的正则表达式和处理它们的方式都将有所发展。 For now, if you're looking to optimize the processing of your script, test for known strings before evaluating using something like
indedOf
to lighten the regex load. 现在,如果您要优化脚本的处理,请在评估之前使用
indedOf
东西测试已知字符串,以减轻正则表达式的负担。
For instance, if you have 4 strings: 例如,如果您有4个字符串:
Each belongs to a different "type" as you've described it, so you could do: 正如您所描述的,每个都属于不同的“类型”,因此您可以执行以下操作:
//type 1 only contains 0 or 1
//type 2 must have a "2"
//type 3 contains only letters
var arr = [
"asdfsdfkjslkdujflkj2lkjsdlkf2lkja",
"100010010100111010100101001001011",
"101032021309420940389579873987113",
"asdfkajhslkdjhflkjshdlfkjhalksjdf"
];
for (s in arr)
{
if (arr[s].indexOf('2') > 0)
{
//type 2
}
else if (arr[s].indexOf('0') > 0)
{
if ((/^[01]+$/g).test(arr[s]))
//type 1
else
//ignore
}
else if ((/^[a-z]+$/gi).test(arr[s]))
//type 3
}
See it in action here: http://jsfiddle.net/remus/44MdX/ 在此处查看其运行情况: http : //jsfiddle.net/remus/44MdX/
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.