[英]Javascript unescape() vs. Python urllib.unquote()
From reading various posts, it seems like JavaScript's unescape()
is equivalent to Pythons urllib.unquote()
, however when I test both I get different results: 通过阅读各种帖子,似乎JavaScript的
unescape()
等同于Pythons urllib.unquote()
,但是当我测试两者时,我会得到不同的结果:
unescape('%u003c%u0062%u0072%u003e');
output: <br>
输出:
<br>
import urllib
urllib.unquote('%u003c%u0062%u0072%u003e')
output: %u003c%u0062%u0072%u003e
输出:
%u003c%u0062%u0072%u003e
I would expect Python to also return <br>
. 我希望Python也能返回
<br>
。 Any ideas as to what I'm missing here? 关于我在这里缺少什么的想法?
Thanks! 谢谢!
%uxxxx
is a non standard URL encoding scheme that is not supported by urllib.parse.unquote()
(Py 3) / urllib.unquote()
(Py 2). %uxxxx
是urllib.parse.unquote()
(Py 3)/ urllib.unquote()
(Py 2)不支持的非标准URL编码方案 。
It was only ever part of ECMAScript ECMA-262 3rd edition; 它只是ECMAScript ECMA-262第3版的一部分; the format was rejected by the W3C and was never a part of an RFC.
格式被W3C拒绝,并且从未成为RFC的一部分。
You could use a regular expression to convert such codepoints: 您可以使用正则表达式来转换此类代码点:
try:
unichr # only in Python 2
except NameError:
unichr = chr # Python 3
re.sub(r'%u([a-fA-F0-9]{4}|[a-fA-F0-9]{2})', lambda m: unichr(int(m.group(1), 16)), quoted)
This decodes both the %uxxxx
and the %uxx
form ECMAScript 3rd ed can decode. 这解码了
%uxxxx
和%uxx
形式ECMAScript 3rd ed可以解码。
Demo: 演示:
>>> import re
>>> quoted = '%u003c%u0062%u0072%u003e'
>>> re.sub(r'%u([a-fA-F0-9]{4}|[a-fA-F0-9]{2})', lambda m: chr(int(m.group(1), 16)), quoted)
'<br>'
>>> altquoted = '%u3c%u0062%u0072%u3e'
>>> re.sub(r'%u([a-fA-F0-9]{4}|[a-fA-F0-9]{2})', lambda m: chr(int(m.group(1), 16)), altquoted)
'<br>'
but you should avoid using the encoding altogether if possible. 但是如果可能的话,你应该完全避免使用编码。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.