简体   繁体   English

Python MySQLdb上传UnicodeEncodeError

[英]Python MySQLdb upload UnicodeEncodeError

I have a problem where I can upload CSV files to MySQL, but then something happens and I get an encoding error. 我有一个问题,可以将CSV文件上传到MySQL,但是随后发生了某些情况,并且出现了编码错误。 Can some one please review my code and tell what is wrong? 可以请人检查一下我的代码并告诉我出什么问题了吗? I'm new to enconding. 我是新手。

The following snippet is how I write the CSV files that will be uploaded, the data is extracted from an MDB file using the MDN tools (mdb-export): 以下代码片段是我编写将要上传的CSV文件的方法,使用MDN工具(mdb-export)从MDB文件中提取了数据:

    tableIndex  = 1
    for tName in tableNames:
        fileName = os.path.join(csvPath, os.path.basename(mdb).split('.')[0] + '_' + tName + '.csv')

        try:
            p = subprocess.Popen(["mdb-export", "-H", mdb, tName], stdout=subprocess.PIPE, stderr=subprocess.PIPE)
            tableContent, error = p.communicate()

            if(p.returncode != 0):
                _logger.error('[%3d] Export Subprocess %d %s' % (tID, p.returncode, tableContent))
                SendMdbError(tID, mdb, _logger, 'ALERT: Export Subprocess')
                return(['', False])
            if(error):
                _logger.error('[%3d] Export Communicate %d %s' % (tID, p.returncode, error.strip()))
                SendMdbError(tID, mdb, _logger, 'ALERT: Export Communicate')
                return(['', False])

        except Exception as ex:
            _logger.exception('[%3d] Export Error' % tID)
            SendMdbError(tID, mdb, _logger, 'ALERT: Export Exception')
            return(['', False])
        except:
            _logger.exception('[%3d] Export Unexpected' % tID)
            SendMdbError(tID, mdb, _logger, 'ALERT: Export Unexpected')
            return(['', False])

        # If no data, no need for corresponding SQL
        if(len(tableContent) == 0):
            emptyTables.append(tName)

        # If data exists, dump data
        else:
            # Add the 'DriveTest' to the data to upload
            tableContent = tableContent.split('\n')

            tableContent = [dt + ',' + line for line in tableContent if(line)]
            tableContent = '\n'.join(tableContent)

            try:
                with open(fileName, 'wb') as f:
                    f.write(tableContent)

                    if(_VERBOSITY):
                        _logger.debug('[%3d] %3d - Write CSV SIZE[%8d] FILE: %s' %(tID, tableIndex, len(tableContent.split('\n')), fileName))
                        tableIndex += 1

            except IOError as err:
                _logger.exception('[%3d] Write IOError: %s' % (tID, str(err)))
                SendMdbError(tID, mdb, _logger, 'ALERT: Write IOError')
                return(['', False])
            except Exception as ex:
                _logger.exception('[%3d] Write Exception' % tID)
                SendMdbError(tID, mdb, _logger, 'ALERT: Write Exception')
                return(['', False])
            except:
                _logger.exception('[%3d] Write Unexpected: %s' % tID)
                SendMdbError(tID, mdb, _logger, 'ALERT: Write Unexpected')
                return(['', False])

The following is where I upload the CSV file, and here is where I get the error: 以下是我上载CSV文件的位置,也是出现错误的位置:

    # Upload the data
    tableIndex = 0
    for table in tableDDL:
        try:

            with warnings.catch_warnings(record=True) as war:

                _logger.info('[%3d] %3d Going up... %s' %(tID, tableIndex+1, os.path.basename(mdb).split('.')[0] + '_' + table))

                _sqlLock[tableIndex].acquire()
                #self.cursor.execute(tableDDL[table])
                self.cursor.execute(tableULD[table])
                self.conn.commit()
                _sqlLock[tableIndex].release()

                if(war):
                    #if(_VERBOSITY): print('[%3d] %3d WARNINGS[%3d] %s' % (tID, tableIndex+1, len(war), os.path.basename(mdb).split('.')[0] + '_' + table))
                    _logger.warning('[%3d] %3d WARNINGS[%3d] %s' % (tID, tableIndex+1, len(war), os.path.basename(mdb).split('.')[0] + '_' + table))
                    for w in war:
                        _logger.warning('[%3d] %s' % (tID, w.message))

                #if(_VERBOSITY): print('[%3d] %3d Uploaded %s' % (tID, tableIndex+1, os.path.basename(mdb).split('.')[0] + '_' + table))
                _logger.info('[%3d] %3d Uploaded %s' % (tID, tableIndex+1, os.path.basename(mdb).split('.')[0] + '_' + table))
                tableIndex += 1

                # Remove the uploaded CSV file
                try:
                    os.remove(csvFiles[table]+'.csv')
                    _logger.info('[%3d] Removed CVS file: %s' % (tID, csvFiles[table]+'.csv'))
                except OSError:
                    pass

        except (MySQLdb.InternalError, MySQLdb.NotSupportedError) as err:
            _logger.error('[%3d] %3d Internal: %s %s' % (tID, tableIndex+1, err, sys.exc_info()[0]))
            self.conn.rollback()
            self.Disconnect(tID, _logger, _VERBOSITY, _DEBUG)
            return(False)
        except MySQLdb.OperationalError as err:
            _logger.error('[%3d] %3d OperationalError: %s' % (tID, tableIndex+1, sys.exc_info()[0]))
            _logger.error(err)
            self.conn.rollback()
            self.Disconnect(tID, _logger, _VERBOSITY, _DEBUG)
            return(False)
        except MySQLdb.ProgrammingError as err:
            _logger.error('[%3d] %3d ProgrammingError: %s' % (tID, tableIndex+1, sys.exc_info()[0]))
            _logger.error(err)
            self.conn.rollback()
            self.Disconnect(tID, _logger, _VERBOSITY, _DEBUG)
            return(False)
        except MySQLdb.Error as err:
            _logger.error('[%3d] %3d QUERY: %s %s' % (tID, tableIndex+1, err, sys.exc_info()[0]))
            self.conn.rollback()
            self.Disconnect(tID, _logger, _VERBOSITY, _DEBUG)
            return(False)
        except Exception as err:
            _logger.error('[%3d] %3d Exception: %s %s' % (tID, tableIndex+1, err, sys.exc_info()[0]))
            #self.conn.rollback()
            #self.Disconnect(tID, _logger, _VERBOSITY, _DEBUG)
            #return(False)
            pass
        except:
            _logger.error('[%3d] %3d Other: %s' % (tID, tableIndex+1, sys.exc_info()[0]))
            self.conn.rollback()
            self.Disconnect(tID, _logger, _VERBOSITY, _DEBUG)
            return(False)

The error I get is the following: 我得到的错误如下:

2015-06-13 19:42:21,743 __main__ -    ERROR - [  1]   1 Exception: 'ascii' codec can't encode character u'\xb4' in position 40: ordinal not in range(128) <type 'exceptions.UnicodeEncodeError'>
2015-06-13 19:42:30,962 __main__ -    ERROR - [  1]   1 Exception: 'ascii' codec can't encode character u'\xb4' in position 27: ordinal not in range(128) <type 'exceptions.UnicodeEncodeError'>

I noticed that the given data gets uploaded, but not sure if all rows are uploaded. 我注意到给定的数据已上传,但不确定是否所有行都已上传。

Thanks! 谢谢!

Try before putting csv into DB s.decode('UTF-8') and after getting it out of the DB s.encode('UTF-8') 在将csv放入DB s.decode('UTF-8')并将其从DB s.encode('UTF-8')取出后,请s.encode('UTF-8')

I did it for SQLite and it worked OK. 我为SQLite做到了,它工作正常。

Getting this to work should not be too difficult, but you have to understand what you're doing. 使它生效并不难,但是您必须了解自己在做什么。 Don't just try all possible combinations of s.encode("UTF-8").decode("UTF-8") and stuff like that. 不要只是尝试s.encode("UTF-8").decode("UTF-8")类的所有可能组合。

First, understand the difference between a string and bytes . 首先,了解stringbytes之间的区别。 See https://docs.python.org/3/howto/unicode.html . 参见https://docs.python.org/3/howto/unicode.html You can encode a string to bytes: bytes = text.encode("UTF-8") , and you can decode bytes to a string: text = bytes.decode("UTF-8") 您可以将字符串编码为字节: bytes = text.encode("UTF-8") ,也可以将字节解码为字符串: text = bytes.decode("UTF-8")

Second since a CSV file is a text file, you should open the CSV file in text mode. 其次,由于CSV文件是文本文件,因此您应该以文本模式打开CSV文件。 open(fileName, 'w', encoding="utf-8") . open(fileName, 'w', encoding="utf-8") There's no need to encode or decode text in your code when writing the file. 编写文件时,无需对代码中的文本进行编码或解码。

Third, it is perfectly OK to write Unicode text to a TEXT field. 第三,将Unicode文本写入TEXT字段是完全可以的。 No need for BINARYs or BLOBs. 不需要BINARY或BLOB。 But make sure your database has a collation setting that can deal with it, usually that would be one of the utf-8 collations. 但是请确保您的数据库具有可以处理的排序规则设置,通常这将是utf-8排序规则之一。 Then to put Unicode in your database, use python strings and don't decode them to bytes. 然后将Unicode放入数据库中,请使用python字符串,不要将其解码为字节。

The error message implies that the column definition in MySQL is CHARACTER SET ascii ; 该错误消息表示MySQL中的列定义为CHARACTER SET ascii is that correct? 那是对的吗?

B4 sounds like the latin1 (not utf8) encoding for ´ , which could be coming from a Microsoft Word document in a context such as it´s . B4听起来像LATIN1(不UTF8)编码´ ,它可以从Microsoft Word文档是未来的情况下,如it´s

So, even changing the column to be CHARACTER SET utf8 won't fix the problem. 因此,即使将列更改为CHARACTER SET utf8也无法解决问题。

BINARY and BLOB are essentially the same type of field -- any byte is allowed. BINARYBLOB本质上是相同的字段类型-允许任何字节。 VARCHAR and TEXT validate the bytes during INSERT to make sure they match the CHARACTER SET . VARCHARTEXTINSERT期间验证字节,以确保它们与CHARACTER SET匹配。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM