简体   繁体   中英

Weird regex bug in iis url-rewrite

This is my pattern:

^(\w{2}-\w{2})/questions(?:/(\w+))?(?:/(\d+))?(?:/.*?)?$

these are the what I'm testing:

en-us/questions/ask
en-us/questions/newest/15
en-us/questions/12/just-a-text-to-be-ignored

It works perfectly, here is the demo:

https://regex101.com/r/yC3tI8/1

but the following rewrite rule:

<rule name="en-us questions" enabled="true" stopProcessing="true">
  <match url="^(\w{2}-\w{2})/questions(?:/(\w+))?(?:/(\d+))?(?:/.*?)?$" />
  <action type="Rewrite" url="/questions.aspx?lang={R:1}&amp;tab={R:2}&amp;pid={R:3}" />
</rule>  

when I give the link en-us/questions/newest redirects to: /questions.aspx?lang=en-us&tab=&pid=

What is wrong with this? Its now about 5 hours I'm just reviewing the same things

Since you have three different possible url endings that ultimately effect the outcome of the rewritten url you can either setup one all inclusive rule that will hopefully match everything you want, or you could setup three rules to handle each accordingly:

One Rule:

^(\w{2}-\w{2})/questions/(\w+)/?(\d+)?.*$

https://regex101.com/r/dN8bM9/1 - tries to handle all cases

<rule name="en-us questions" enabled="true" stopProcessing="true">
  <match url="^(\w{2}-\w{2})/questions/(\w+)/?(\d+)?.*$" />
  <action type="Rewrite" url="/questions.aspx?lang={R:1}&amp;tab={R:2}&amp;pid={R:3}" />
</rule> 

* note: one possible reason the original pattern was failing to capture the second group was the inclusion of (?:) - which means match but don't capture; leaving that out might solve most of the issues there.

Three Rules:

^(\w{2}-\w{2})/questions/(\w+)$

https://regex101.com/r/lI8bQ1/1 - en-us/questions/[single word]

^(\w{2}-\w{2})/questions/(\d+)/.*$

https://regex101.com/r/hV5fK3/1 - en-us/questions/[digits]/discard

^(\w{2}-\w{2})/questions/(\w+)/(\d+)$

https://regex101.com/r/kO0dJ0/1 - en-us/questions/[single word]/[digits]

Putting it all together into a ruleset:

<rule name="en-us questions case one" enabled="true" stopProcessing="true">
  <match url="^(\w{2}-\w{2})/questions/(\w+)$" />
  <action type="Rewrite" url="/questions.aspx?lang={R:1}&amp;tab={R:2}" />
</rule>  
<rule name="en-us questions case two" enabled="true" stopProcessing="true">
  <match url="^(\w{2}-\w{2})/questions/(\d+)/.*$" />
  <action type="Rewrite" url="/questions.aspx?lang={R:1}&amp;tab={R:2}" />
</rule>  
<rule name="en-us questions case three" enabled="true" stopProcessing="true">
  <match url="^(\w{2}-\w{2})/questions/(\w+)/(\d+)$" />
  <action type="Rewrite" url="/questions.aspx?lang={R:1}&amp;tab={R:2}&amp;pid={R:3}" />
</rule>

* note: you might need to adjust this in some way, but it should give you an idea of how to accomodate three different variations (as you seem to have) for rewriting your urls.

Note you have three lazy captures:

  1. (?:/(\\w+))?
  2. (?:/(\\d+))?
  3. (?:/.*?)?

asp.net's regex implementation interprets ? as:

In addition to specifying that a given pattern may occur exactly 0 or 1 time, the ? character also forces a pattern or subpattern to match the minimal number of characters when it might match several in an input string.

So asp.net is assigning no characters to 1 , no characters to 2 , and collecting the rest of the characters 3 .

To use greedy matching instead of the lazy matching ? forces use: {0,1}

So you're regex should look like:

^(\w{2}-\w{2})/questions(?:/(\w+)){0,1}(?:/(\d+)){0,1}(?:/.*?)?$

Live example

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM