简体   繁体   中英

Should url be stored in encoded or decoded form?

My question is a bit weird, but let me explain:

  1. Assuming valid URI doesn't allow unicode per RFC-2396, all unicode in URI should be escaped using percentage encoding.

  2. A valid URL should be a valid URI, so we should use http://example.com/%E4%BD%A0%E5%A5%BD instead of http://example.com/你好 when making request or putting them in href (even though most browsers can handle the latter case).

  3. More over, we accept user-submitted URLs, which are encoded as well (since browsers encode them when you copy URL from address bar).

  4. So we made a decision (likely a mistake) to store them as http://example.com/%E4%BD%A0%E5%A5%BD , instead of http://example.com/你好 , after-all, that's the original input and the correct url.

My question comes when I try to display such URLs, given they are user-submitted, I need to run xss filter on these data. Some implementations, such as xss-filters appear to run encodeURI as part of the filter, meaning % will be double-encoded, eg. %E4 -> %25E4 , breaking url in the process.

So should we have stored url in decoded form (even though they are invalid)? It doesn't make much sense to me to run decodeURI on output...

First, RFC 2396 is obsoleted by RFC 3986 . Second, yes , you should have stored your URIs in decoded form if your storage mechanism allows it.

Update From Section 2.4

Under normal circumstances, the only time when octets within a URI are percent-encoded is during the process of producing the URI from its component parts.

Update 2 Furthermore, a string of unicode characters representing a URI is, in fact, an IRI. See RFC 3987

Note that https://url.spec.whatwg.org/#urls is what defines URLs. It supersedes those RFCs you mentioned.

Ie, your premise is incorrect, specifically this section:

A valid URL should be a valid URI, so we should use http://example.com/%E4%BD%A0%E5%A5%BD instead of http://example.com/你好 when making request or putting them in href (even though most browsers can handle the latter case).

What makes you say that? http://example.com/你好 is a perfectly valid URL.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM