简体   繁体   中英

From an string that contains a symbol without knowing the encoding of that symbol, how can I show the complete string avoiding having errors?

I have many strings retrieved from a database that include some characters that I need to show, as for example € (I am using python 2.7). but the problem is that the following error appeared:

UnicodeDecodeError: 'utf8' codec can't decode byte 0x80 in position 33: invalid start byte 

The string in this case is something like st = ' the price in €' but it could be a different symbol (for now the error only appears in that case but in the future another character could give me that problem)

I managed that error using:

st = st.decode('utf8', errors='ignore')

The problem with that solution is that it removes the symbol €, but I want to show that symbol. I tried using repr(st) to find what encoding is and it gave me '\\x80' .

I want to find a way in which I can print that char € but without specifically search for that symbol (because it could be another) and manage to not have that error.

I don't know if there is another way to see the problem, because my approach was to try to find the encoding of that char and try to converted in a normal string, but I found that the error also appeared trying to encode into 'latin1', 'utf-8' or 'ascii'. Maybe my problem is that I don't have any experience with encoding, I'm just a noob.

Try chardet library

This library can detect the encoding of strings. But it cannot guarantee to be 100% accurate because that is impossible, at least for now. You can read their docs for detailed explanation. Hopefully this solves your problem.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM