简体   繁体   English

如何将unicode转义序列URL转换为python unicode?

[英]How to convert unicode escape sequence URL to python unicode?

what is the right way to do it if the URL has some unicode chars in it, and is escaped in the client side using javascript ( escape(text) )? 如果网址中包含一些Unicode字符,并且使用javascript(escape(text))在客户端进行转义,那么正确的方法是什么? For example, if my url is: domain.com/?text=%u05D0%u05D9%u05DA%20%u05DE%u05DE%u05D9%u05E8%u05D9%u05DD%20%u05D0%u05EA%20%u05D4%u05D8%u05E7%u05E1%u05D8%20%u05D4%u05D6%u05D4 例如,如果我的网址是:domain.com/?text=%u05D0%u05D9%u05DA%20%u05DE%u05DE%u05D9%u05E8%u05D9%u05DD%20%u05D0%u05EA%20%u05D4%u05D8% u05E1%u05D8%20%u05D4%u05D6%u05D4

I tried: text = urllib.unquote(request.GET.get('text')) but I got the exact same string back (%u05D0%u05D9%u05DA%20%u05DE ... ) 我试过了:text = urllib.unquote(request.GET.get('text'))但我得到了完全相同的字符串(%u05D0%u05D9%u05DA%20%u05DE ...)

eventually what I did is changed the client side from escape(text) to urlEncodeComponent(text) and then in the python side used: 最终,我所做的是将客户端从escape(text)更改为urlEncodeComponent(text),然后在python端使用了:

request.encoding = 'UTF-8' text = unicode(request.GET.get('text', None)) request.encoding ='UTF-8'text = unicode(request.GET.get('text',None))

Not sure this is the best thing to do, but it works in English and Hebrew 不确定这是最好的做法,但是它可以用英语和希伯来语工作

Because your %uxxxx is not Python-standard, which is \\uxxxx, you need a tricky transform to replace '%' with '\\', like following(tested in my Python shell): 由于您的%uxxxx不是Python标准的\\ uxxxx,因此您需要进行棘手的转换才能将'%'替换为'\\',如下所示(在我的Python shell中测试):

>>> import sys; reload(sys); sys.setdefaultencoding('utf8')
<module 'sys' (built-in)>
>>> text = '%u05D0%u05D9%u05DA%20%u05DE%u05DE%u05D9%u05E8%u05D9%u05DD%20%u05D0%u05EA%20%u05D4%u05D8%u05E7%u05E1%u05D8%20%u05D4%u05D6%u05D4'
>>> text = text.replace('%', '\\')
>>> text_u = text.decode('unicode-escape')
>>> print text_u
איךממיריםאתהטקסטהזה

After transformed into Unicode type, You can then transform it to whatever encoding you like, as following: 转换为Unicode类型后,您可以将其转换为所需的任何编码,如下所示:

>>> text_utf8 = text_u.encode('utf8')
>>> text_utf8
'\xd7\x90\xd7\x99\xd7\x9a\x10\xd7\x9e\xd7\x9e\xd7\x99\xd7\xa8\xd7\x99\xd7\x9d\x10\xd7\x90\xd7\xaa\x10\xd7\x94\xd7\x98\xd7\xa7\xd7\xa1\xd7\x98\x10\xd7\x94\xd7\x96\xd7\x94'
>>> print text_utf8
איךממיריםאתהטקסטהזה

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在python中转义UNICODE字符串(到javascript转义) - How to escape UNICODE string in python (to javascript escape) 如何在反应 javascript 中将 unicode 转义序列转换为 unicode 字符 - How to convert unicode escape sequences to unicode characters in react javascript 如何使用jQuery或纯JavaScript检索Unicode转义序列? - How do I retrieve Unicode escape sequence with jQuery or plain JavaScript? Javascript:存储访问 unicode 时 Unicode 转义序列无效 - Javascript: Invalid Unicode escape sequence while storing accessing unicode Javascript,将 unicode 字符串转换为 Javascript 转义? - Javascript, convert unicode string to Javascript escape? 如何在 JavaScript 中转换 unicode? - How to convert unicode in JavaScript? 在JS中将转义的unicode序列转换为Emoji - Convert escaped unicode sequence to Emoji in JS 将Unicode转义序列转换为Symbol,然后转储到dom节点 - Converting Unicode Escape Sequence to Symbol, and dumping to dom node Javascript:基于BYTE的十六进制转义序列的unicode字符(不是代理) - Javascript: unicode character to BYTE based hex escape sequence (NOT surrogates) Unicode字符转义序列在handlebar.js中无法正确打印 - Unicode character escape sequence are not printing correctly in handlebar.js
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM