Getting a Â character before the degree symbol

Question

I am trying to concatenate the degree symbol to a string so I can write it to a word document. I have tried to do it like this.

degreeChar = u'\N{DEGREE SIGN}'
print degreeChar.encode('UTF-8')

The output I get from this is Â° and I am not sure why Â is showing up. What am I doing wrong? Very frustrated!

Thanks.

Answer 1

When you do this:

>>> degreeChar = u'\N{DEGREE SIGN}'

degreeChar is a one-character Unicode string—in particular, u'°' :

>>> len(degreeChar)
1
>>> ord(degreeChar)
176

When you encode it to UTF-8, you get a 2-byte UTF-8 byte string:

>>> dc = degreeChar.encode('UTF-8')
>>> len(dc)
2
>>> ord(dc[0]), ord(dc[1])
(194, 176)

As UTF-8, that pair of bytes means u'°' . But as, say, Latin-1 or cp1252, the exact same pair of bytes means u'Â°' . That's the whole point of different encodings—the same byte sequence means different things in different encodings. To see the details:

>>> dc2 = dc.decode('latin-1')
>>> len(dc2)
2
>>> ord(dc2[0]), ord(dc2[1])
(194, 176)

So, what happens if you try to print the UTF-8 string to a cp1252 terminal? Or save it to a binary file that you then open as a cp1252 text file? Well, you get Â° of course.

So, how do you solve this?

Well, just don't try to print UTF-8-encoded bytes to a cp1252 terminal! If Python has successfully guessed your terminal's encoding, just print it as a Unicode string in the first place:

>>> print u'°'
°

If not, you either need to fix your configuration so Python does guess your terminal's encoding correctly (easy on most *nix systems, not so much on Windows…), or specify it manually, or just encode to the right encoding instead of the wrong one:

>>> print u'°'.encode('cp1252')
°

Answer 2

degreeChar = u'\N{DEGREE SIGN}'
print degreeChar

It should be fine as unicode ... at least on windows 7 this command works as expected

Answer 3

The document where ° is located is encoded with UTF-8, but the interpreter assumes it as different.

In my case I just added UTF-8 BOM mark to that document, so the interpreter become aware of the content encoding.

Getting a Â character before the degree symbol

Question

3 answers

solution1
6 2013-08-05 19:21:26

solution2
0 2013-08-05 18:57:10

solution3
0 2020-02-04 09:30:21

Getting a Â character before the degree symbol

Question

3 answers

solution1 6 2013-08-05 19:21:26

solution2 0 2013-08-05 18:57:10

solution3 0 2020-02-04 09:30:21

solution1
6 2013-08-05 19:21:26

solution2
0 2013-08-05 18:57:10

solution3
0 2020-02-04 09:30:21