如何有效地将字符串与大量正则表达式匹配

Question

I want to be able to effectively match a string with a number of regular expressions to determine what this string represents. 我希望能够有效地将字符串与许多正则表达式匹配，以确定此字符串代表什么。

^[0-9]{1}$         if string matches it is of type 1
^[a-x]{300}$       if string matches it is of type 2
...                ...

Iterating over a collection containing all of the regular expressions every time I want to match a string is way too heavy for me. 每次我想匹配一个字符串时，都要对包含所有正则表达式的集合进行迭代，这对我来说太麻烦了。

Is there any more effective way? 有没有更有效的方法？ Maybe I can compile these regexps into one big one? 也许我可以将这些正则表达式编译成一个大的正则表达式？ Maybe something that works like Google Suggestions, analysing letter after letter? 也许像Google Recommendations这样的东西可以分析一个又一个字母吗？

In my project, I am using PHP/MySQL, however I will be thankful for a clue in any language. 在我的项目中，我正在使用PHP / MySQL，但是对于任何语言的线索我都会感激不尽。

Edit: Operation of matching a string will be very frequent and string values will vary. 编辑：匹配字符串的操作将非常频繁，并且字符串值将有所不同。

Answer 1

What you could do, if possible, is grouping your regexes together and determine in which group a string belongs. 如果可能的话，您可以做的就是将正则表达式分组在一起，并确定字符串属于哪个组。

For instance, if a string doesn't match \\d , you know there is no digit in it and you can skip all regexes that require one. 例如，如果字符串与\\d不匹配，则说明其中没有数字，您可以跳过所有需要一个的正则表达式。 So (for instance) instead of matching against +300 regexes, you can narrow that down to just 25. 因此（例如）您可以将其范围缩小到25个，而不是与+300个正则表达式匹配。

Answer 2

You can sum up your regexes like this: 您可以像这样总结您的正则表达式：

^([0-9])|([a-x]{300})$

Later, if you get more regex, you can do this: 以后，如果您获得更多的正则表达式，则可以执行以下操作：

^([0-9])|([a-x]{300})|([x-z]{1,5})|([ab]{2,})$...

Then use this code: 然后使用以下代码：

$input=...
preg_match_all('#^([0-9])|([a-x]{300})$#', $input, $matches);

foreach ($matches as $val) {
    if (isset($val[1])) {
       // type 1
    } else if (isset($val[2])) {
       // type 2
    }
    // and so on...
}

Answer 3

Since the regexes are going to be changing, I don't think you can get a generic answer - both your regex(es), and the way you handle them will need to evolve. 由于正则表达式将会发生变化，因此我认为您无法获得一个通用的答案-您的正则表达式和处理它们的方式都将有所发展。 For now, if you're looking to optimize the processing of your script, test for known strings before evaluating using something like indedOf to lighten the regex load. 现在，如果您要优化脚本的处理，请在评估之前使用indedOf东西测试已知字符串，以减轻正则表达式的负担。

For instance, if you have 4 strings: 例如，如果您有4个字符串：

asdfsdfkjslkdujflkj2lkjsdlkf2lkja asdfsdfkjslkdujflkj2lkjsdlkf2lkja
100010010100111010100101001001011 10001001010011101010010100100101011
101032021309420940389579873987113 101032021309420940389579873987113
asdfkajhslkdjhflkjshdlfkjhalksjdf asdfkajhslkdjhflkjshdlfkjhalksjdf

Each belongs to a different "type" as you've described it, so you could do: 正如您所描述的，每个都属于不同的“类型”，因此您可以执行以下操作：

//type 1 only contains 0 or 1
//type 2 must have a "2"
//type 3 contains only letters

var arr = [
    "asdfsdfkjslkdujflkj2lkjsdlkf2lkja",
    "100010010100111010100101001001011",
    "101032021309420940389579873987113",
    "asdfkajhslkdjhflkjshdlfkjhalksjdf"
    ];

for (s in arr)
{
    if (arr[s].indexOf('2') > 0)
    {
        //type 2
    }
    else if (arr[s].indexOf('0') > 0)
    {
        if ((/^[01]+$/g).test(arr[s]))
            //type 1
        else
            //ignore
    }
    else if ((/^[a-z]+$/gi).test(arr[s]))
        //type 3
}

See it in action here: http://jsfiddle.net/remus/44MdX/ 在此处查看其运行情况： http : //jsfiddle.net/remus/44MdX/

如何有效地将字符串与大量正则表达式匹配

问题描述

3 个解决方案

解决方案1
0 已采纳 2013-12-05 11:59:33

解决方案2
0 2013-12-05 11:59:42

解决方案3
0 2013-12-05 16:49:18

如何有效地将字符串与大量正则表达式匹配

问题描述

3 个解决方案

解决方案1 0 已采纳 2013-12-05 11:59:33

解决方案2 0 2013-12-05 11:59:42

解决方案3 0 2013-12-05 16:49:18

解决方案1
0 已采纳 2013-12-05 11:59:33

解决方案2
0 2013-12-05 11:59:42

解决方案3
0 2013-12-05 16:49:18