简体   繁体   中英

sqlalchemy insert blob raises UnicodeDecodeError

Trying to reuse a working Python code from Mac to Windows. The code compresses a utf8 string using gzip and inserts the output as a blob using SQLAlchemy.

However I get the following error after the insertion:

        UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position 15: character maps to <undefined>

The relevant section:

        from sqlalchemy import *
        import zlib

        pcaxis_table = Table('pcaxis_data', meta, autoload=True, autoload_with=engine)

        try:
            response = urllib2.urlretrieve(url_source)
        except Exception as e:
            print url_source
            raise e

        infile = response.read()
        px_file = infile.decode('cp1252').encode('utf-8')

        cmpstr = zlib.compress(px_file)


        #out = StringIO.StringIO()
        #with gzip.GzipFile(fileobj=out, mode="w") as f:
        #    f.write(px_file)

        ins = pcaxis_table.insert(values = {'TableSQL':tableSQL,
                                            'zip_file':cmpstr, #out.getvalue()
                                            })
        ins.execute()

Trace... (it fails when trying to decode the blob as cp1252)

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Anaconda\Lib\site-packages\sqlalchemy\sql\base.py", line 386, in execute
    return e._execute_clauseelement(self, multiparams, params)
  File "C:\Anaconda\Lib\site-packages\sqlalchemy\engine\base.py", line 1758, in _execute_clauseelement
    return connection._execute_clauseelement(elem, multiparams, params)
  File "C:\Anaconda\Lib\site-packages\sqlalchemy\engine\base.py", line 826, in _execute_clauseelement
    compiled_sql, distilled_params
  File "C:\Anaconda\Lib\site-packages\sqlalchemy\engine\base.py", line 958, in _execute_context
    context)
  File "C:\Anaconda\Lib\site-packages\sqlalchemy\engine\base.py", line 1162, in _handle_dbapi_exception
    util.reraise(*exc_info)
  File "C:\Anaconda\Lib\site-packages\sqlalchemy\engine\base.py", line 951, in _execute_context
    context)
  File "C:\Anaconda\Lib\site-packages\sqlalchemy\engine\default.py", line 436, in do_execute
    cursor.execute(statement, parameters)
  File "C:\Anaconda\Lib\site-packages\pymysql\cursors.py", line 100, in execute
    query = query % escaped_args
  File "C:\Anaconda\lib\encodings\cp1252.py", line 15, in decode
    return codecs.charmap_decode(input,errors,decoding_table)
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position 6: character maps to <undefined>

and the MySQL table:

create table pcaxis_data(
    id int NOT NULL AUTO_INCREMENT,
    TableSQL varchar(25),
    zip_file BLOB,
    inserttime TIMESTAMP,
    PRIMARY KEY (id)
);

The problem is with .decode('cp1252') . Windows-1252 codepage doesn't use all bytes (so for example byte 8f is not used and fails to decode). You can use latin1 instead.

Is response actually a Windows-1252 text? If it is not, decoding it as such makes no sense.

zlib.compress takes a bytestring parameter and response is a bytestring, you can compress it directly, without re-encoding.

Just solved the issue. How? Upgrading pymysql from circa 0.6.0 to 0.6.3. What was the problem? The pymysql driver tries to escape the binary data by doing a conversion to unicode. The byte \\x08 does not map to unicode using UTF8 nor Latin1. Thats why this failed.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM