简体   繁体   English

处理重音字符Python / MySQL

[英]Handling Accented Characters Python/MySQL

I have a system I'm building that inserts rows into a MySQL database through POST requests (API built in Flask/Python). 我有一个正在构建的系统,该系统通过POST请求(在Flask / Python中构建的API)将行插入MySQL数据库。 Some of the rows have accents in them. 其中一些行带有重音符号。 Particularly I have a row that has the name Péter in it. 特别是我有一排名为Péter的行。 The output in the code when I do the SELECT for the DB in my code is P\\xc3\\xa9ter . 当我对代码中的数据库执行SELECT时,代码中的输出为P\\xc3\\xa9ter This has required me to do some work in regards to character encoding. 这要求我在字符编码方面做一些工作。 When I do my GET request, I pull the data and attempt to output it as a JSON response but get this error: 当我执行GET请求时,我提取数据并尝试将其输出为JSON响应,但出现以下错误:

UnicodeDecodeError: 'utf8' codec can't decode byte 0xe9 in position 1: invalid continuation byte

Other GET requests are fine on rows without accents so I've begun to isolate it down to that issue. 其他GET请求在没有重音的行上都很好,因此我开始将其隔离到该问题。

I am using an Amazon RDS instance as my MySQL database. 我正在使用Amazon RDS实例作为MySQL数据库。 By default, RDS instances are latin-1 encoded. 默认情况下,RDS实例是latin-1编码的。 I've gone in and updated my parameter groups and everything now seems to be utf-8 encoded. 我进入并更新了参数组,现在所有内容似乎都是utf-8编码的。 Here are my character and collation variables: 这是我的角色和排序规则变量:

+--------------------------+-------------------------------------------+
| Variable_name            | Value                                     |
+--------------------------+-------------------------------------------+
| character_set_client     | utf8                                      |
| character_set_connection | utf8                                      |
| character_set_database   | utf8                                      |
| character_set_filesystem | binary                                    |
| character_set_results    | utf8                                      |
| character_set_server     | utf8                                      |
| character_set_system     | utf8                                      |
| character_sets_dir       | /rdsdbbin/mysql-5.6.27.R1/share/charsets/ |
+--------------------------+-------------------------------------------+
8 rows in set (0.00 sec)

+----------------------+-----------------+
| Variable_name        | Value           |
+----------------------+-----------------+
| collation_connection | utf8_general_ci |
| collation_database   | utf8_unicode_ci |
| collation_server     | utf8_unicode_ci |
+----------------------+-----------------+
3 rows in set (0.00 sec)

I rebooted the instance and even reloaded the entire database. 我重新启动了实例,甚至重新加载了整个数据库。 As further clarification, I'm running this API locally on my MySQL database and it's working fine (which again leads me to think it's encoding because the entire database has been imported directly from my localhost version). 为了进一步说明,我正在MySQL数据库本地运行此API,并且运行良好(这又使我认为它正在编码,因为整个数据库都是直接从localhost版本导入的)。

I'm not entirely sure what my next step would be to troubleshoot this. 我不确定下一步要解决的问题。 Is it possibly that it is saving it incorrectly into the DB? 是否有可能将其错误地保存到了数据库中? I don't do any encoding before I insert it into the DB. 在将其插入数据库之前,我不做任何编码。 The character does show up as an é in the DB when I do a SELECT statement on it from the command line (should it be encoded somehow in the DB)? 当我从命令行对其执行SELECT语句时,该字符确实在数据库中显示为é(应该在数据库中以某种方式对其进行编码)吗?

Thanks for your help! 谢谢你的帮助!

For anyone else having this issue, I just had to set charset = 'utf8' in my connection string (explicitly set the charset). 对于其他遇到此问题的人,我只需要在我的连接字符串中设置charset = 'utf8' (明确设置字符集)即可。 I tried encoding strings in the code etc but this did the trick immediately. 我尝试在代码中对字符串进行编码等,但这立即起到了作用。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM