简体   繁体   中英

How to find the most specific match from a list of regex patterns?

I have a series of regex patterns, and am matching incoming HttpRequest paths to these. I would like to iterate through them and find the most specific match (a URI may match more than one regex pattern).

For example "/static/images/foo.jpg" would match three of following regex patterns I have:

^/
^/static/images/
^/static/
^/echo/$

How can I iterate through the list, and determine that the most specific match is ^/static/images/ ?


For the sake of simplicity, let's assume "most specific" here means most characters or sub-patterns matched, reading from left to right . I realize that if we introduce something like the following regex, "most specific" becomes ambiguous:

.*\.(jpg|png)$

As mentionned in the comments - there's no definitive way to settle the problem other than manually. However you can do a few things to come up with a semi-heuristic algorithm that theoretically can at least help you in your particular case:

  1. You can test the length of the pattern. In the example longest = most specific, and although that's not always the case, it can at least give an idea,
  2. You can test the patterns agains themselves. For example - ^/static/ fits in ^/static/images/ , so ^/static/images/ is more specific,
  3. You can keep track of how many URIs already matched a particular pattern. The less URIs match a pattern - the more specific it is.

You would use alternation.

You scale specificity from left to right.

1 - most specific
4 - least specific

(^/static/images/|^/static/|^/|^/echo/$)
         1            2      3     4

If it matches 1, it will not bother with 2,3, and 4.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM