简体   繁体   中英

convert escaped unicode sequence to human readable format

I've been using this python code:

pattern = u'丨フ丨ノ一丨ノ丶フノ一ノ丨フ一一ノフフ丶'
result = [u'<span id="z_i_t2_bis" title="\u7ad6\u6298\u7ad6\u6487\u6a2a\u7ad6\u6487\u637a\u6298\u6487\u6a2a\u6487\u7ad6\u6298\u6a2a\u6a2a\u6487\u6298\u6298\u637a">\u4e28\u30d5\u4e28\u30ce\u4e00\u4e28\u30ce\u4e36\u30d5\u30ce\u4e00\u30ce\u4e28\u30d5\u4e00\u4e00\u30ce\u30d5\u30d5\u4e36</span>']

if pattern in result[0]:
    print('found')

But this is cumbersome and moreover doesn't really do what I want, which is to get the escaped gobbledygook back into something comprehensible, as in that pattern. Is there some simple unix tool or commnand to perform this task quickly and efficiently?

seems that is one would work , but I tried it and it did not. ie,

result = "\u4e28\u30d5\u4e28\u30ce\u4e00\u4e28\u30ce\u4e36\u30d5\u30ce\u4e00\u30ce\u4e28\u30d5\u4e00\u4e00\u30ce\u30d5\u30d5\u4e36"

result.decode('utf-8')

which generated the error: attribute error 'str' object has no attribute 'decode'

If you simply print(result) then you'll get the "gobbledygook", because that's what Python uses when it gives you an unambiguous output as an element of a list or tuple. But if you print the string directly, print(result[0]) , it will try to print the natural characters as they were intended.

If you want to convert the characters to utf-8 yourself, use encode rather than decode . encode converts a Unicode string to bytes, decode produces a Unicode string from bytes.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM