[英]Python & MySql: Unicode and Encoding
我正在解析 json 數據並嘗試將一些 json 數據存儲到 Mysql 數據庫中。 我目前正在關注 unicode 錯誤。 我的問題是我應該如何處理這個問題。
這是我的表結構
CREATE TABLE yahoo_questions (
question_id varchar(40) NOT NULL,
question_subj varbinary(255),
question_content varbinary(255),
question_userId varchar(40) NOT NULL,
question_timestamp varchar(40),
category_id varbinary(20) NOT NULL,
category_name varchar(40) NOT NULL,
choosen_answer varbinary(255),
choosen_userId varchar(40),
choosen_usernick varchar(40),
choosen_ans_timestamp varchar(40),
UNIQUE (question_id)
);
通過 python 代碼插入時出錯:
Traceback (most recent call last):
File "YahooQueryData.py", line 78, in <module>
+"VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s)", (row[2], row[5], row[6], quserId, questionTime, categoryId, categoryName, qChosenAnswer, choosenUserId, choosenNickName, choosenTimeStamp))
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/MySQLdb/cursors.py", line 159, in execute
query = query % db.literal(args)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/MySQLdb/connections.py", line 264, in literal
return self.escape(o, self.encoders)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/MySQLdb/connections.py", line 202, in unicode_literal
return db.literal(u.encode(unicode_literal.charset))
UnicodeEncodeError: 'latin-1' codec can't encode characters in position 204-230: ordinal not in range(256)
Python 代碼段:
#pushing user id to the url to get full json stack
urlobject = urllib.urlopen(base_url.format(row[2]))
qnadatajson = urlobject.read()
data = json.loads(qnadatajson)
cur.execute("INSERT INTO yahoo_questions (question_id, question_subj, question_content, question_userId, question_timestamp,"
+"category_id, category_name, choosen_answer, choosen_userId, choosen_usernick, choosen_ans_timestamp)"
+"VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s)", (row[2], row[5], row[6], quserId, questionTime, categoryId, categoryName, qChosenAnswer, choosenUserId, choosenNickName, choosenTimeStamp))
json結構
questions: [
{
Id: "20111201185322AA5HTDc",
Subject: "what are the new pokemon call?",
Content: "I used to know them I stop at dialga and palkia version and I heard there's new ones what's it call
",
Date: "2011-12-01 18:53:22",
Timestamp: "1322794402",
在運行查詢之前我還做了什么我在mysql SET character_set_client = utf8
上執行以下操作
這就是 mysql 變量的樣子:
mysql> SHOW variables LIKE '%character_set%';
+--------------------------+--------------------------------------------------------+
| Variable_name | Value |
+--------------------------+--------------------------------------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /usr/local/mysql-5.5.10-osx10.6-x86_64/share/charsets/ |
+--------------------------+--------------------------------------------------------+
8 rows in set (0.00 sec)
我認為您的MYSQLdb python庫不知道應該將其編碼為utf8,並且正在編碼為默認的python系統定義的charset latin1
。
當您connect()
到數據庫時,傳遞charset='utf8'
參數。 這也應該不需要手動的SET NAMES
或SET character_set_client
。
首先,確保在建立MySQL連接時分配了charset
和use_unicode
參數:
conn = mysql.connect(host='127.0.0.1',
user='user',
passwd='passwd',
db='db',
charset='utf8',
use_unicode=True)
其次,在實際查詢數據庫時使用准備好的語句 。 下面是一個包含Unicode字符的字符串的INSERT查詢示例。
cursor.execute('INSERT INTO mytable VALUES (null, %s)',
('Some string that contains unicode: ' + unichr(300),))
仍然面臨同樣的問題,
嘗試降級您的 mysql-connector-python 版本,這對我有用。
將mysql-connector-python==8.0.30 更改為mysql-connector-python==8.0.28 。
復制這個,
pip uninstall mysql-connector-python==8.0.30
pip install mysql-connector-python==8.0.28
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.