[英]sqlite remove non utf-8 characters
I have an sqlite db that has some crazy ascii characters in it and I would like to remove them, but I have no idea how to go about doing it. 我有一个sqlite数据库,其中有一些疯狂的ascii字符,我想删除它们,但我不知道如何去做。 I googled some stuff and found some people saying to use REGEXP with mysql, but that threw an error saying REGEXP wasn't recognized. 我搜索了一些东西,发现有些人说使用带有mysql的REGEXP,但是这引发了一个错误,说REGEXP无法识别。
Here is the error I get: 这是我得到的错误:
sqlalchemy.exc.OperationalError: (OperationalError) Could not decode to UTF-8 column 'table_name' with text ...
Thanks for the help 谢谢您的帮助
Well, if you really want to shoehorn a rich unicode string into a plain ascii string (and don't mind some goofs), you could use this: 好吧,如果你真的想把一个丰富的unicode字符串塞进一个简单的ascii字符串(并且不介意一些傻瓜),你可以使用这个:
import unicodedata as ud
def shoehorn_unicode_into_ascii(s):
# This removes accents, but also other things, like ß‘’“”
return ud.normalize('NFKD', s).encode('ascii','ignore')
For a more complete solution (with somewhat fewer goofs, but requiring a third-party module unidecode ), see this answer . 要获得更完整的解决方案(需要更少的傻瓜,但需要第三方模块unidecode ), 请参阅此答案 。
Really, though, the best solution is to work with unicode data throughout your code as much as possible, and drop to an encoding only when necessary. 但实际上,最好的解决方案是尽可能地在整个代码中使用unicode数据,并且只在必要时才进行编码。
django.utils.encoding具有一组强大的unicode编码和解码功能。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.