简体   繁体   English

如何使用表情符号和特殊字符编码国际字符串以存储在数据库中

[英]How to encode international strings with emoticons and special characters for storing in database

I want to use a API from a game and store the player and clan names in a local database.我想使用游戏中的 API 并将玩家和氏族名称存储在本地数据库中。 The names can contain all sorts of characters and emoticons.名称可以包含各种字符和表情符号。 Here are just a few examples I found:以下是我找到的几个例子:

  • ⭐💎 ⭐💎
  • яαℓαηι яαℓαηι
  • نکل嘿嘿
  • 窝猫窝猫
  • 鐵擊道遊隊铁击道游队
  • ❤✖❤♠️♦️♣️✖ ❤✖❤♠️♦️♣️✖

I use python for reading the api and write it into a mysql database.我使用python读取api并将其写入mysql数据库。 After that, I want to use the names on a Node.js web application.之后,我想在 Node.js Web 应用程序上使用这些名称。

What is the best way to encode those characters and how can I savely store them in the database, so that I can display them correcly afterwards?对这些字符进行编码的最佳方法是什么,如何将它们保存在数据库中,以便之后可以正确显示它们?

I tried to encode the strings in python with utf-8:我试图用 utf-8 在 python 中编码字符串:

>>> sample = '蛙喜鄉民CLUB'
>>> sample
'蛙喜鄉民CLUB'
>>> sample = sample.encode('UTF-8')
>>> sample
b'\xe8\x9b\x99\xe5\x96\x9c\xe9\x84\x89\xe6\xb0\x91CLUB'

and storing the encoded string in a mysql database with utf8mb4_unicode_ci character set.并使用utf8mb4_unicode_ci字符集将编码后的字符串存储在 mysql 数据库中。

When I store the string from above and select it inside mysql workbench it is displayed like this:当我从上面存储字符串并在 mysql 工作台中选择它时,它显示如下:

蛙喜鄉民CLUB

When I read this string from the database again in python (and store it in db_str ) I get:当我在 python 中再次从数据库中读取这个字符串(并将其存储在db_str )时,我得到:

>>> db_str
èåéæ°CLUB
>>> db_str.encode('UTF-8')
b'\xc3\xa8\xc2\x9b\xc2\x99\xc3\xa5\xc2\x96\xc2\x9c\xc3\xa9\xc2\x84\xc2\x89\xc3\xa6\xc2\xb0\xc2\x91CLUB'

The first output is total gibberish, the second one with utf-8 looks mostly like the encoded string from above, but with added \\xc2 or \\xc3 between each byte.第一个输出完全是胡言乱语,第二个 utf-8 看起来很像上面的编码字符串,但在每个字节之间添加了\\xc2\\xc3

How can I save such strings into mysql, so that I can read them again and display them correctly inside a python script?如何将这些字符串保存到 mysql 中,以便我可以再次读取它们并在 python 脚本中正确显示它们?

Is my database collation utf8mb4_unicode_ci not suitable for such content?我的数据库整理 utf8mb4_unicode_ci 不适合此类内容吗? Or do I have to use another encoding?还是我必须使用其他编码?

As described by @abarnert in a comment to the question, the problem was that the library used for written the unicode strings didn't know that utf-8 should be used and therefor encoded the strings wrong.正如@abarnert 在对该问题的评论中所描述的那样,问题在于用于编写 unicode 字符串的库不知道应该使用 utf-8,因此对字符串进行了错误的编码。

After adding charset='utf8mb4' as parameter to the mysql connection the string get written correctly in the intended encoding.charset='utf8mb4'作为参数添加到 mysql 连接后,字符串会以预期的编码正确写入。

All I had to change was我必须改变的是

conn = MySQLdb.connect(host, user, pass, db, port)

to

conn = MySQLdb.connect(host, user, pass, db, port, charset='utf8mb4')

and after that my approach described in the question worked flawlessly.之后我在问题中描述的方法完美无缺。

edit: after declaring the charset='utf8mb4' parameter on the connection object it is no longer necessary to encode the strings, as that gets now already successfully done by the mysqlclient library.编辑:在连接对象上声明charset='utf8mb4'参数后,不再需要对字符串进行编码,因为 mysqlclient 库现在已经成功完成了。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM