简体   繁体   中英

What's the maximum length should I allow to avoid Catastrophic backtracking?

The line is approximately 7915621 in length and is actually the view state value of an ASPX website.

I get the original HTML of the site, then pass it line by line to the extract function, and as soon as it reaches the view_state line containing that long string, the regex become stuck.

Here is the regex pattern that get stuck,

/[\w\.]+\@[\w]+(?:\.[\w]{3}|\.[\w]{2}\.[\w]{2})\b/gi

I thought about setting a maximum line length to skip this line or any other lines like that but I can't think of a optimal size as I care about false positives.

[\w\.]+ is found so many times in your document that it becomes a problem to process them with your expression.

Reducing the amount of places to start searching at is a possible solution. Eg using a word boundary.

(?:\.\w{3}|\.\w{2}\.\w{2}) can be streamlined as \.\w{2}(?:\w|\.\w{2}) .

Use

/\b[\w.]+@\w+\.\w{2}(?:\w|\.\w{2})\b/gi

Or, get rid of the brackets

/\b\w+(?:\.\w+)*@\w+\.\w{2}(?:\w|\.\w{2})\b/gi

EXPLANATION

--------------------------------------------------------------------------------
  \b                       the boundary between a word char (\w) and
                           something that is not a word char
--------------------------------------------------------------------------------
  \w+                      word characters (a-z, A-Z, 0-9, _) (1 or
                           more times (matching the most amount
                           possible))
--------------------------------------------------------------------------------
  (?:                      group, but do not capture (0 or more times
                           (matching the most amount possible)):
--------------------------------------------------------------------------------
    \.                       '.'
--------------------------------------------------------------------------------
    \w+                      word characters (a-z, A-Z, 0-9, _) (1 or
                             more times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
  )*                       end of grouping
--------------------------------------------------------------------------------
  @                        '@'
--------------------------------------------------------------------------------
  \w+                      word characters (a-z, A-Z, 0-9, _) (1 or
                           more times (matching the most amount
                           possible))
--------------------------------------------------------------------------------
  \.                       '.'
--------------------------------------------------------------------------------
  \w{2}                    word characters (a-z, A-Z, 0-9, _) (2
                           times)
--------------------------------------------------------------------------------
  (?:                      group, but do not capture:
--------------------------------------------------------------------------------
    \w                       word characters (a-z, A-Z, 0-9, _)
--------------------------------------------------------------------------------
   |                        OR
--------------------------------------------------------------------------------
    \.                       '.'
--------------------------------------------------------------------------------
    \w{2}                    word characters (a-z, A-Z, 0-9, _) (2
                             times)
--------------------------------------------------------------------------------
  )                        end of grouping
--------------------------------------------------------------------------------
  \b                       the boundary between a word char (\w) and
                           something that is not a word char

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM