简体   繁体   English

Python MySQL UTF-8编码因执行顺序而异

[英]Python MySQL UTF-8 encoding differs depending on order of execution

I recently inherited a python project and I've got some behavior I'm struggling to account for. 我最近继承了一个python项目,我有一些我很难解释的行为。

The code has two sections, it can import a file into the database, or it can dump the database to an output file. 代码有两个部分,它可以将文件导入数据库,也可以将数据库转储到输出文件。 The import looks something like this: 导入看起来像这样:

def importStuff(self):
    mysqlimport_args = ['mysqlimport', '--host='+self.host, '--user='+self.username, '--password='+self.password, '--fields-terminated-by=|', '--lines-terminated-by=\n', '--replace', '--local', self.database, filename, '-v']
    output = check_output(mysqlimport_args)

The dump looks like this: 转储看起来像这样:

def getStuff(self):
    db = MySQLdb.connect(self.host, self.username, self.password, self.database)
    cursor = db.cursor()
    sql = 'SELECT somestuff'
    cursor.execute(sql)
    records = cursor.fetchall()
    cursor.close()
    db.close()
    return records

def toCsv(self, records, csvfile):
    f = open(csvfile, 'wb')
    writer = csv.writer(f, quoting=csv.QUOTE_ALL)
    writer.writerow(['StuffId'])
    count = 1
    for record in records:
        writer.writerow([record[0]])

    f.close()

Okay not the prettiest python you'll ever see (style comments welcome as I'd love to learn more) but it seems reasonable. 好吧,不是你见过的最漂亮的蟒蛇(风格评论欢迎,因为我喜欢了解更多),但这似乎是合理的。

But, I got a complaint from a consumer that my output wasn't in UTF-8 (the mysql table is using utf8 encoding by the way). 但是,我收到消费者的抱怨,我的输出不是UTF-8(mysql表顺便使用utf8编码)。 Here's where I get lost, if the program executes like this: 这是我迷路的地方,如果程序执行如下:

importStuff(...)

getStuff(...)

toCsv(...)

Then the output file doesn't appear to be valid utf-8. 然后输出文件似乎不是有效的utf-8。 When I break the execution into two different steps 当我将执行分解为两个不同的步骤时

importStuff(...)

then in another file 然后在另一个文件中

getStuff(...)

toCsv(...)

Suddenly my output appears as valid utf-8. 突然,我的输出显示为有效的utf-8。 Aside from the fact that I have a work around, I can't seem to explain this behavior. 除了我有一个解决方案的事实,我似乎无法解释这种行为。 Can anyone shed some light on what I'm doing wrong here? 任何人都可以解释我在这里做错了什么吗? Or is there more information I can provide that might clarify what's going on? 或者是否有更多我可以提供的信息可以澄清正在发生的事情?

Thanks. 谢谢。

(python 2.7 in case that factors in) (如果是因素,则为python 2.7)

EDIT: More code as requested. 编辑:根据要求提供更多代码。 I've made some minor tweaks to protect the innocent such as my company, but it's more or less here: 我做了一些小的调整来保护像我公司这样的无辜者,但这或多或少都在这里:

def main():

    dbutil = DbUtil(config.DB_HOST, config.DB_DATABASE, config.DB_USERNAME, config.DB_PASSWORD)
    if(args.import):
        logger.info('Option: --import')

        try:
            dbutil.mysqlimport(AcConfig.DB_FUND_TABLE)
        except Exception, e:
            logger.warn("Error occured at mysqlimport. Error is %s" % (e.message))

    if(args.db2csv):
        try:
            logger.info('Option: --db2csv')
            records = dbutil.getStuff()
            fileutil.toCsv(records, csvfile)
        except Exception, e:
            logger.warn("Error Occured at db2csv. Message:%s" %(e.message))

main()

And that's about it. 这就是它。 It's really short which is making this much less obvious. 它真的很短,这使得这个不太明显。

The output I'm not sure how to faithfully represent, it looks something like this: 输出我不确定如何忠实地表示,它看起来像这样:

"F0NR006F8F"

They all look like more or less ASCII characters to me, so I'm not sure what problem they could be creating. 它们看起来像或多或少的ASCII字符给我,所以我不确定他们可以创建什么问题。 Maybe I'm approaching this from the wrong angle, I'm currently relying on my text editor's best guess for what encoding a file is in. I'm not sure how I could best detect which character is causing it to stop reading my file as utf-8. 也许我正在从错误的角度接近这个,我目前依赖于我的文本编辑器对文件编码的最佳猜测。我不知道如何最好地检测哪个字符导致它停止读取我的文件as utf-8。

Dumbest answer of all time. 有史以来最愚蠢的回答。 The input data wasn't in UTF-8. 输入数据不是UTF-8。 Someone solved this by writing another sproc that would be called periodically to convert the non-utf-8 characters to utf-8. 有人通过编写另一个将定期调用以将非utf-8字符转换为utf-8的sproc来解决这个问题。 In the time it took me to break my code into two files and run them separately, the job ran. 在我将代码分成两个文件并单独运行的时候,工作就开始了。 It just happened to run that way the 4-5 times I tried it leading to a false conclusion on my part. 它碰巧以这种方式运行4-5次我试过它导致我的错误结论。 I'm now changing the read process to accommodate a non-utf-8 input source so I don't have a weird race condition hiding in the system. 我现在正在改变读取过程以适应非utf-8输入源,因此我没有隐藏在系统中的怪异竞争条件。 Sorry to have lead you all on this goosechase. 很抱歉带领大家一起来这个goosechase。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM