From an string that contains a symbol without knowing the encoding of that symbol, how can I show the complete string avoiding having errors?

Question

I have many strings retrieved from a database that include some characters that I need to show, as for example € (I am using python 2.7). but the problem is that the following error appeared:

UnicodeDecodeError: 'utf8' codec can't decode byte 0x80 in position 33: invalid start byte

The string in this case is something like st = ' the price in €' but it could be a different symbol (for now the error only appears in that case but in the future another character could give me that problem)

I managed that error using:

st = st.decode('utf8', errors='ignore')

The problem with that solution is that it removes the symbol €, but I want to show that symbol. I tried using repr(st) to find what encoding is and it gave me '\\x80' .

I want to find a way in which I can print that char € but without specifically search for that symbol (because it could be another) and manage to not have that error.

I don't know if there is another way to see the problem, because my approach was to try to find the encoding of that char and try to converted in a normal string, but I found that the error also appeared trying to encode into 'latin1', 'utf-8' or 'ascii'. Maybe my problem is that I don't have any experience with encoding, I'm just a noob.

Answer 1

Try chardet library

This library can detect the encoding of strings. But it cannot guarantee to be 100% accurate because that is impossible, at least for now. You can read their docs for detailed explanation. Hopefully this solves your problem.

From an string that contains a symbol without knowing the encoding of that symbol, how can I show the complete string avoiding having errors?

Question

1 answers

solution1
0 2019-06-05 10:09:57

From an string that contains a symbol without knowing the encoding of that symbol, how can I show the complete string avoiding having errors?

Question

1 answers

solution1 0 2019-06-05 10:09:57

solution1
0 2019-06-05 10:09:57