简体   繁体   中英

Remove non utf-8 characters from string in Python 3.4

I am trying to retrieve some data from MySQL and I have problems reading the data. The column datatype is varchar with utf8_general-ci. I tried decoding it but it doesn't work. So, I want to remove those non utf8 characters since I don't need those non utf8 characters.

#This is the line causing the problem:
line: ((123, 'Classical Musicï¼\x8c', 69),)

conn = db.cursor()
conn.execute(sql) 
data = conn.fetchall()
for line in data:
    for x in line:
        print(x)

Error code received

UnicodeEncodeError: 'charmap' codec can't encode character '\x8c' in position 17

I have tried decode('utf-8') but I get another error.

conn = db.cursor()
conn.execute(sql) 
data = conn.fetchall()
for line in data:
    for x in line:
        print(x[1].decode('utf-8'))

AttributeError: 'str' object has no attribute 'decode'

Mojibake and double-encoding, plus mangling by Python.

Start over. Make everything utf8 -- text, connections, CHARACTER SET , html header.

If you still have problems, come back; hopefully your code will be close enough to correct for us to prescribe a cure.

Meanwhile, read more of the threads around here; simpler versions of the mess abound.

C3AF C2BB C2BF was supposed a fancy comma, correct? The utf8 hex should have been EFBC8C . What process generated that comma?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM