I am trying to concatenate the degree symbol to a string so I can write it to a word document. I have tried to do it like this.
degreeChar = u'\N{DEGREE SIGN}'
print degreeChar.encode('UTF-8')
The output I get from this is °
and I am not sure why Â
is showing up. What am I doing wrong? Very frustrated!
Thanks.
When you do this:
>>> degreeChar = u'\N{DEGREE SIGN}'
degreeChar
is a one-character Unicode string—in particular, u'°'
:
>>> len(degreeChar)
1
>>> ord(degreeChar)
176
When you encode it to UTF-8, you get a 2-byte UTF-8 byte string:
>>> dc = degreeChar.encode('UTF-8')
>>> len(dc)
2
>>> ord(dc[0]), ord(dc[1])
(194, 176)
As UTF-8, that pair of bytes means u'°'
. But as, say, Latin-1 or cp1252, the exact same pair of bytes means u'°'
. That's the whole point of different encodings—the same byte sequence means different things in different encodings. To see the details:
>>> dc2 = dc.decode('latin-1')
>>> len(dc2)
2
>>> ord(dc2[0]), ord(dc2[1])
(194, 176)
So, what happens if you try to print
the UTF-8 string to a cp1252 terminal? Or save it to a binary file that you then open as a cp1252 text file? Well, you get °
of course.
So, how do you solve this?
Well, just don't try to print UTF-8-encoded bytes to a cp1252 terminal! If Python has successfully guessed your terminal's encoding, just print it as a Unicode string in the first place:
>>> print u'°'
°
If not, you either need to fix your configuration so Python does guess your terminal's encoding correctly (easy on most *nix systems, not so much on Windows…), or specify it manually, or just encode to the right encoding instead of the wrong one:
>>> print u'°'.encode('cp1252')
°
degreeChar = u'\N{DEGREE SIGN}'
print degreeChar
It should be fine as unicode ... at least on windows 7 this command works as expected
The document where °
is located is encoded with UTF-8, but the interpreter assumes it as different.
In my case I just added UTF-8 BOM mark to that document, so the interpreter become aware of the content encoding.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.