简体   繁体   中英

RegEx taking too much time and memory

I have this RegEx pattern

    ^(\\d|\\w)+\\..*

and this is my input

    (1) nu11111111111111
    (2) nu1111111111111111111
    (3) nu1111111111111111111111111111111111111

Time has taken by input 2 is higher than input 1 and It returns Not Matched result. But for input 3, I didn't get any response even after 30 min of execution. I am observing the memory as well and it increases continuously.

Below is my code snippet:

    String input1 = "nu11111111111111";
    String input2 = "nu1111111111111111111";
    String input3 = "nu1111111111111111111111111111111111111";
    try
    {

        if (input3.matches("^(\\d|\\w)+\\..*"))
        {
            System.out.println("Matched");
        }
        else
        {
            System.out.println("Not Matched");
        }
    }
    catch (Exception e)
    {
        e.printStackTrace();
    }

This is another case of catastrophic backtracking, as \\d is already included in \\w . As there is no match to be found, the regex engine tries to backtrack into every possible combination of matching either \\w or \\d against your series of 1 s - which is quite a lot.

To get a little insight into what is happening, see https://regex101.com/r/4fRRpc/1/ and go to the regex debugger. This uses a PCRE pattern without startup optimizations, which should be pretty similar to what java appears to do in this case.

For your regex, use ^\\\\w+\\\\..* instead.

That Java regex engine is pathetic.

› time perl -E'say /^(\d|\w)+\..*/ ? "Matched" : "Not Matched" for qw(nu11111111111111 nu1111111111111111111 nu1111111111111111111111111111111111111)'
Not Matched
Not Matched
Not Matched

real    0m0,009s
user    0m0,006s
sys     0m0,003s

Try RE2 , it does not backtrack and has Java bindings.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM