convert escaped unicode sequence to human readable format

Question

I've been using this python code:

pattern = u'丨フ丨ノ一丨ノ丶フノ一ノ丨フ一一ノフフ丶'
result = [u'<span id="z_i_t2_bis" title="\u7ad6\u6298\u7ad6\u6487\u6a2a\u7ad6\u6487\u637a\u6298\u6487\u6a2a\u6487\u7ad6\u6298\u6a2a\u6a2a\u6487\u6298\u6298\u637a">\u4e28\u30d5\u4e28\u30ce\u4e00\u4e28\u30ce\u4e36\u30d5\u30ce\u4e00\u30ce\u4e28\u30d5\u4e00\u4e00\u30ce\u30d5\u30d5\u4e36</span>']

if pattern in result[0]:
    print('found')

But this is cumbersome and moreover doesn't really do what I want, which is to get the escaped gobbledygook back into something comprehensible, as in that pattern. Is there some simple unix tool or commnand to perform this task quickly and efficiently?

seems that is one would work , but I tried it and it did not. ie,

result = "\u4e28\u30d5\u4e28\u30ce\u4e00\u4e28\u30ce\u4e36\u30d5\u30ce\u4e00\u30ce\u4e28\u30d5\u4e00\u4e00\u30ce\u30d5\u30d5\u4e36"

result.decode('utf-8')

which generated the error: attribute error 'str' object has no attribute 'decode'

Answer 1

If you simply print(result) then you'll get the "gobbledygook", because that's what Python uses when it gives you an unambiguous output as an element of a list or tuple. But if you print the string directly, print(result[0]) , it will try to print the natural characters as they were intended.

If you want to convert the characters to utf-8 yourself, use encode rather than decode . encode converts a Unicode string to bytes, decode produces a Unicode string from bytes.

convert escaped unicode sequence to human readable format

Question

1 answers

solution1
1 ACCPTED 2015-02-10 05:22:35

convert escaped unicode sequence to human readable format

Question

1 answers

solution1 1 ACCPTED 2015-02-10 05:22:35

solution1
1 ACCPTED 2015-02-10 05:22:35