简体   繁体   中英

How to convert string into unicode in Python3?

I tried a lot of ways to convert the string like b'\\xef\\xbb\\xbf\\xe5\\x9b\\xbd\\xe9\\x99\\x85\\xe5\\x8f\\x8b\\xe8\\xb0\\x8a' into Chinese characters but all failed.

It's really strange that when I just use

print(b'\\xef\\xbb\\xbf\\xe5\\x9b\\xbd\\xe9\\x99\\x85\\xe5\\x8f\\x8b\\xe8\\xb0\\x8a')

It will show decoded Chinese Characters.

But if I got the string by reading from my CSV file, it won't do. No matter how I decode the string, it will only show me b'\\xef\\xbb\\xbf\\xe5\\x9b\\xbd\\xe9\\x99\\x85\\xe5\\x8f\\x8b\\xe8\\xb0\\x8a'

Here is my script:

import csv 

with open('need_convert.csv','r+') as csvfile:
    reader=csv.reader(csvfile)
    for row in reader:

        new_row=''.join(row)
        print('new_row:')
        print(type(new_row))
        print(new_row)

        print('convert:')
        print(new_row.decode('utf-8'))

Here is my data (csv file): b'\\xef\\xbb\\xbf\\xe5\\x9b\\xbd\\xe9\\x99\\x85\\xe5\\x8f\\x8b\\xe8\\xb0\\x8a' b'\\xef\\xbb\\xbf\\xe9\\xba\\x92\\xe9\\xba\\x9f\\xe6\\x9d\\xaf' b'\\xef\\xbb\\xbf\\xe5\\x9b\\xbd\\xe9\\x99\\x85\\xe5\\x8f\\x8b\\xe8\\xb0\\x8a'

row contents and new_row are both strings, not byte types. Below, I'm using exec('s=' + row[0]) to interpret them as desired, assuming the input is safe.

import csv

with open('need_convert.csv','r+') as csvfile:
    reader=csv.reader(csvfile)
    for row in reader:
        print(type(row[0]), row[0])
        exec('s=' + row[0])
        print(type(s), s)
        print(s.decode('utf-8'))

Output:

<class 'str'> b'\xef\xbb\xbf\xe5\x9b\xbd\xe9\x99\x85\xe5\x8f\x8b\xe8\xb0\x8a'
<class 'bytes'> b'\xef\xbb\xbf\xe5\x9b\xbd\xe9\x99\x85\xe5\x8f\x8b\xe8\xb0\x8a'
国际友谊
<class 'str'> b'\xef\xbb\xbf\xe9\xba\x92\xe9\xba\x9f\xe6\x9d\xaf'
<class 'bytes'> b'\xef\xbb\xbf\xe9\xba\x92\xe9\xba\x9f\xe6\x9d\xaf'
麒麟杯
<class 'str'> b'\xef\xbb\xbf\xe5\x9b\xbd\xe9\x99\x85\xe5\x8f\x8b\xe8\xb0\x8a'
<class 'bytes'> b'\xef\xbb\xbf\xe5\x9b\xbd\xe9\x99\x85\xe5\x8f\x8b\xe8\xb0\x8a'
国际友谊

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM