简体   繁体   English

sqlite删除非utf-8字符

[英]sqlite remove non utf-8 characters

I have an sqlite db that has some crazy ascii characters in it and I would like to remove them, but I have no idea how to go about doing it. 我有一个sqlite数据库,其中有一些疯狂的ascii字符,我想删除它们,但我不知道如何去做。 I googled some stuff and found some people saying to use REGEXP with mysql, but that threw an error saying REGEXP wasn't recognized. 我搜索了一些东西,发现有些人说使用带有mysql的REGEXP,但是这引发了一个错误,说REGEXP无法识别。

Here is the error I get: 这是我得到的错误:

sqlalchemy.exc.OperationalError: (OperationalError) Could not decode to UTF-8 column 'table_name' with text ...

Thanks for the help 谢谢您的帮助

Well, if you really want to shoehorn a rich unicode string into a plain ascii string (and don't mind some goofs), you could use this: 好吧,如果你真的想把一个丰富的unicode字符串塞进一个简单的ascii字符串(并且不介意一些傻瓜),你可以使用这个:

import unicodedata as ud
def shoehorn_unicode_into_ascii(s):
    # This removes accents, but also other things, like ß‘’“”
    return ud.normalize('NFKD', s).encode('ascii','ignore')

For a more complete solution (with somewhat fewer goofs, but requiring a third-party module unidecode ), see this answer . 要获得更完整的解决方案(需要更少的傻瓜,但需要第三方模块unidecode ), 请参阅此答案

Really, though, the best solution is to work with unicode data throughout your code as much as possible, and drop to an encoding only when necessary. 但实际上,最好的解决方案是尽可能地在整个代码中使用unicode数据,并且只在必要时才进行编码。

django.utils.encoding具有一组强大的unicode编码和解码功能。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM