简体   繁体   English

从包含符号的字符串中不知道该符号的编码,如何显示完整的字符串以避免出错?

[英]From an string that contains a symbol without knowing the encoding of that symbol, how can I show the complete string avoiding having errors?

I have many strings retrieved from a database that include some characters that I need to show, as for example € (I am using python 2.7). 我从数据库中检索了许多字符串,其中包含我需要显示的一些字符,例如€(我使用的是python 2.7)。 but the problem is that the following error appeared: 但问题是出现以下错误:

UnicodeDecodeError: 'utf8' codec can't decode byte 0x80 in position 33: invalid start byte 

The string in this case is something like st = ' the price in €' but it could be a different symbol (for now the error only appears in that case but in the future another character could give me that problem) 在这种情况下,字符串就像st = ' the price in €'但它可能是一个不同的符号(现在错误只出现在那种情况下,但将来另一个角色可能会给我这个问题)

I managed that error using: 我使用以下方法管理错误:

st = st.decode('utf8', errors='ignore')

The problem with that solution is that it removes the symbol €, but I want to show that symbol. 该解决方案的问题是它删除符号€,但我想显示该符号。 I tried using repr(st) to find what encoding is and it gave me '\\x80' . 我尝试使用repr(st)来查找编码是什么,它给了我'\\x80'

I want to find a way in which I can print that char € but without specifically search for that symbol (because it could be another) and manage to not have that error. 我想找到一种方法,我可以打印该字符但没有专门搜索该符号(因为它可能是另一个)并设法没有该错误。

I don't know if there is another way to see the problem, because my approach was to try to find the encoding of that char and try to converted in a normal string, but I found that the error also appeared trying to encode into 'latin1', 'utf-8' or 'ascii'. 我不知道是否有另一种方法来查看问题,因为我的方法是尝试找到该char的编码并尝试转换为普通字符串,但我发现错误也出现了尝试编码为' latin1','utf-8'或'ascii'。 Maybe my problem is that I don't have any experience with encoding, I'm just a noob. 也许我的问题是我没有任何编码经验,我只是一个菜鸟。

Try chardet library 尝试chardet库

This library can detect the encoding of strings. 该库可以检测字符串的编码。 But it cannot guarantee to be 100% accurate because that is impossible, at least for now. 但它不能保证100%准确,因为这是不可能的,至少现在是这样。 You can read their docs for detailed explanation. 您可以阅读他们的文档以获得详细说明。 Hopefully this solves your problem. 希望这能解决您的问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在 Python 字符串中提取# 符号后的单词 - How can I extract a word after # symbol in Python String 如何使用字符串作为python中的结束符号? - How can I use a string to be the end symbol in python? Python:如何在不知道格式的情况下将字符串转换为日期时间? - Python: How can I convert string to datetime without knowing the format? 如何从 pandas 系列中的字符串中去除“$”符号? - How do I strip the “$” symbol from a string in a pandas series? 如何修复python中字符串的编码错误 - How can I fix encoding errors in a string in python 如何从字符串中提取子字符串,避免包含分隔符? - How can I extract a substring from a string, avoiding including the delimiters? 如何测试\\符号(反斜杠)是否在字符串中? - How to test if \ symbol (backslash) is in a string? 没有非字母符号的反向字符串 - Reverse string without non letters symbol 我试图确定一个字符串是否是一个问题。 我如何分析“?”符号(python) - I am trying to determine if a string is a Question. How can I analyze the “?” symbol (python) 如何在不知道行号的情况下将文件读取为以给定单词开头的字符串? - How can I read a file to a string starting at a given word without knowing the line number?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM