How to decode a text in python3?

Question

I have a text Aur\xc3\xa9lien and want to decode it with python 3.8.

I tried the following

import codecs
s = "Aur\xc3\xa9lien"
codecs.decode(s, "urf-8")
codecs.decode(bytes(s), "urf-8")
codecs.decode(bytes(s, "utf-8"), "utf-8")

but none of them gives the correct result Aurélien .

How to do it correctly?

And is there no basic, general authoritative simple page that describes all these encodings for python?

Answer 1

First find the encoding of the string and then decode it... to do this you will need to make a byte string by adding the letter 'b' to the front of the original string.

Try this:

import chardet

s = "Aur\xc3\xa9lien"
bs = b"Aur\xc3\xa9lien"

encoding = chardet.detect(bs)["encoding"]

str = s.encode(encoding).decode("utf-8")

print(str)

If you are reading the text from a file you can detect the encoding using the magic lib, see here: https://stackoverflow.com/a/16203777/1544937

Answer 2

You have UTF-8 decoded as latin-1 , so the solution is to encode as latin-1 then decode as UTF-8 .

s = "Aur\xc3\xa9lien"
s.encode('latin-1').decode('utf-8')
print(s.encode('latin-1').decode('utf-8'))

Output
Aurélien

Answer 3

Your string is not a Unicode sequence, so you should prefix it with b

import codecs
b = b"Aur\xc3\xa9lien"
b.decode('utf-8')

So you have the expected: 'Aurélien' .

If you want to use s , you should use mbcs , latin-1 , mac_roman or any 8-bit encoding. It doesn't matter. Such 8-bit codecs can get the binary character in your string correctly (a 1 to 1 mapping). So you get a byte array (and so now you can use the first part of this answers and so you can decode the binary string.

How to decode a text in python3?

Question

3 answers

solution1
2 ACCPTED 2021-02-04 15:24:51

solution2
0 2021-02-04 15:25:33

solution3
0 2021-02-04 15:58:04

How to decode a text in python3?

Question

3 answers

solution1 2 ACCPTED 2021-02-04 15:24:51

solution2 0 2021-02-04 15:25:33

solution3 0 2021-02-04 15:58:04

solution1
2 ACCPTED 2021-02-04 15:24:51

solution2
0 2021-02-04 15:25:33

solution3
0 2021-02-04 15:58:04