简体   繁体   中英

What causes Tuckey's UrlRewriteFilter to malform urlencoded unicode characters (e.g. %C3%B6 for ö) and how can I avoid it?

We are using a simple UrlRewriteFilter rule to permanently (301) redirect HTTP requests without trailing slash to the same URL with trailing slash.

In some cases our presentation layer needs URLs with encoded special characters (eg %C3%B6 for ö) in it, which works fine as long as the UrlRewriteFilter is not involved. But when the rule kicks in I can see the encoded character getting malformed while redirecting, eg

www.mydomain.com/asdf%C3%B6asdf/ --> 301 --> www.mydomain.com/asdf%F6asdf/

%F6 not being a valid unicode sequence (ending up as question mark in black diamond when urldecoded).

We use UTF-8 throughout our application, it's set in response headers as well as in the HTML's <head> section. The malformed encoding occurs on Windows and Linux machines. The rewrite rule looks as follows

<rule enabled="true" match-type="regex" >
    <name>Force trailing slash</name>
    <note>...</note>
    <condition type="request-uri" operator="notequal">...>/condition> <!-- some URLs shall not be redirected -->
    <from>(^[^\?]*)(\?.*)?$</from>
    <to type="permanent-redirect" last="true" >$1/$2</to> <!-- adding trailing slash and query string, if present -->
</rule>

I'd be happy for any ideas how this could be solved. I've played with the decode-using and encode attributes, but it did not help.

I had a similar problem. what I did was set decode to null :

<urlrewrite decode-using="null">

The issue I described below seems to be related to this bug report , which has been filed in 2010 and is untouched since then. I'll probably have to work around this by handling the request "manually" using Java. Other ideas are still welcome, though.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM