简体   繁体   English

使用latin1字符集在表中编码UTF8数据

[英]Encoding UTF8 data within a table with latin1 character set

I have a [legacy] mysql table with character set of "latin-1" but storing json information in "utf-8" . 我有一个[旧版] mysql表,其字符集为"latin-1"但将json信息存储在"utf-8" A user interface is connected to this table which shows the characters correctly. 用户界面连接到该表,该表正确显示了字符。 I need to update this table using a python script but I can't get rid of encoding hell. 我需要使用python脚本更新此表,但无法摆脱编码地狱。

On mysql shell I issue "select words from pip where id_pip=42" and receive: 在mysql shell上,我发出"select words from pip where id_pip=42"并接收:

"ventilationsplåtslageri":{"day":"1000","hour":"200","min":"30"}

But when I tried to fetch it from database, I couldn't get the same encoding even though I try several different encodings. 但是,当我尝试从数据库中获取它时,即使尝试了几种不同的编码,也无法获得相同的编码。

#!/usr/bin/env python                                                           
# -*- coding: utf-8 -*-                                                         
import MySQLdb                                                                  
import json                                                                     
dbconn = MySQLdb.connect(host="host",port=3306,user="user",       
                passwd="pass",db="db", use_unicode=True, charset="utf8")
cursor1 = dbconn.cursor()                                                       
cursor1.execute("select words from pip where id_pip=42")  
track = cursor1.fetchall()                                               
print json.dumps(track, encoding="utf8" )

I tried many different configuarations on this code, eg I changed "use_unicode=False, charset="latin1" with print json.dumps(filter_track, encoding="utf8" ) but I still get either "ventilationspl\Ã\¥tslageri\\" or "ventilationspl\åtslageri\\" and not what I want which is: "ventilationsplÃ¥tslageri" I couldn't change the database and I need to update this field of database with sql update command, so I am afraid if I mess up the lagacy database. 我在此代码上尝试了许多不同的配置,例如,我用print json.dumps(filter_track, encoding="utf8" )更改了"use_unicode=False, charset="latin1" print json.dumps(filter_track, encoding="utf8" )但仍然得到了"ventilationspl\Ã\¥tslageri\\""ventilationspl\åtslageri\\"而不是我想要的是: "ventilationsplÃ¥tslageri"我无法更改数据库,我需要使用sql update命令更新数据库的此字段,因此如果我搞砸了时延性,我担心数据库。

I'm not sure if I understand your question, but... 我不确定我是否理解您的问题,但是...

If the content is being returned in Latin-1 and you want it in UTF-8 , I would assume that you'd first need to decode the content from Latin-1 and then encode it to UTF-8 . 如果内容以Latin-1返回并且您希望以UTF-8返回 ,则我认为您首先需要从Latin-1解码内容,然后将其编码为UTF-8

latin1_content.decode('latin1').encode('utf8')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM