[英]how to convert unicode string on unicode format with python?
I'm a student to learn python scrapy(crawler). 我是一个学习python scrapy(crawler)的学生。
I want to convert unicode string to str in python. 我想将unicode字符串转换为python中的str。 but this unicode string is not common string.
但是此unicode字符串不是通用字符串。 this unicode is unicode format.
此unicode是unicode格式。 please see below code.
请参见下面的代码。
# python 2.7
...
print(type(name[0]))
print(name[0])
print(type(keyword_name_temp))
print(keyword_name_temp)
...
I can see console like below, when run upper script. 运行上脚本时,我可以看到如下所示的控制台。
$ <type 'unicode'>
$ 서용교 ## this words is korean characters
$ <type 'unicode'>
$ u'\\uc9c0\\ubc29\\uc790\\uce58\\ub2e8\\uccb4'
I want see "keyword_name_temp" as korean. 我想将“ keyword_name_temp”视为韩文。 but I don't know how to do...
但我不知道该怎么办...
I got the name list and keyword_name_temp from html code with http request. 我从带有http请求的html代码中获得了名称列表和keyword_name_temp。
name list fundamentally was String format. 名单基本上是字符串格式。
keyword_name_temp fundamentally was unicode format. keyword_name_temp基本上是unicode格式。
please anybody help me ! 请任何人帮助我!
最简单的解决方案是切换到Python 3,默认情况下字符串为Unicode。
u'\\\지\\\방\\\자\\\치\\\단\\\체'
contains real backslashes (backslash being an escape character in Python string literals, python interpreter prints backslash in strings as \\\\
) followed by u
and hex sequences, not literal Unicode characters U+C9C0 etc. which are commonly written using \\u\u003c/code> escape sequence
(Would that string happen to come from some JSON object perhaps?)
u'\\\지\\\방\\\자\\\치\\\단\\\체'
包含真实的反斜杠(反斜杠是Python字符串文字中的转义字符,python解释程序将反斜杠在字符串中打印为\\\\
),后跟u
和hex序列,而不是通常使用\\u\u003c/code>转义序列
编写的文字Unicode字符U + C9C0等(该字符串是否可能来自某个JSON对象?)
You can construct a JSON string out of it, and use
json.loads()
to transform to a unicode string: 您可以从中构造一个JSON字符串,然后使用
json.loads()
转换为unicode字符串:
Example in Python 2.7:
Python 2.7中的示例:
>>> s1 = u'서용교'
>>> type(s1)
<type 'unicode'>
>>> s1
u'\uc11c\uc6a9\uad50'
>>> print(s1)
서용교
>>>
>>>
>>> s2 = u'\\uc9c0\\ubc29\\uc790\\uce58\\ub2e8\\uccb4'
>>> type(s2)
<type 'unicode'>
>>>
>>> # put that unicode string between double-quotes
>>> # so that json module can interpret it
>>> ts2 = u'"%s"' % s2
>>> ts2
u'"\\uc9c0\\ubc29\\uc790\\uce58\\ub2e8\\uccb4"'
>>>
>>> import json
>>> json.loads(ts2)
u'\uc9c0\ubc29\uc790\uce58\ub2e8\uccb4'
>>> print(json.loads(ts2))
지방자치단체
>>>
Another option is to make it a string literal
另一种选择是将其设置为字符串文字
>>> import ast
>>>
>>> # construct a string literal, with the 'u' prefix
>>> s2_literal = u'u"%s"' % s2
>>> s2_literal
u'u"\\uc9c0\\ubc29\\uc790\\uce58\\ub2e8\\uccb4"'
>>> print(ast.literal_eval(s2_literal))
지방자치단체
>>>
>>> # also works with single-quotes string literals
>>> s2_literal2 = u"u'%s'" % s2
>>> s2_literal2
u"u'\\uc9c0\\ubc29\\uc790\\uce58\\ub2e8\\uccb4'"
>>>
>>> print(ast.literal_eval(s2_literal2))
지방자치단체
>>>
您的字符串是unicode,并且如果您知道编码:例如utf-8,则可以尝试
print name[0].decode("utf-8")
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.