简体   繁体   中英

Getting a  character before the degree symbol

I am trying to concatenate the degree symbol to a string so I can write it to a word document. I have tried to do it like this.

degreeChar = u'\N{DEGREE SIGN}'
print degreeChar.encode('UTF-8')

The output I get from this is ° and I am not sure why  is showing up. What am I doing wrong? Very frustrated!

Thanks.

When you do this:

>>> degreeChar = u'\N{DEGREE SIGN}'

degreeChar is a one-character Unicode string—in particular, u'°' :

>>> len(degreeChar)
1
>>> ord(degreeChar)
176

When you encode it to UTF-8, you get a 2-byte UTF-8 byte string:

>>> dc = degreeChar.encode('UTF-8')
>>> len(dc)
2
>>> ord(dc[0]), ord(dc[1])
(194, 176)

As UTF-8, that pair of bytes means u'°' . But as, say, Latin-1 or cp1252, the exact same pair of bytes means u'°' . That's the whole point of different encodings—the same byte sequence means different things in different encodings. To see the details:

>>> dc2 = dc.decode('latin-1')
>>> len(dc2)
2
>>> ord(dc2[0]), ord(dc2[1])
(194, 176)

So, what happens if you try to print the UTF-8 string to a cp1252 terminal? Or save it to a binary file that you then open as a cp1252 text file? Well, you get ° of course.


So, how do you solve this?

Well, just don't try to print UTF-8-encoded bytes to a cp1252 terminal! If Python has successfully guessed your terminal's encoding, just print it as a Unicode string in the first place:

>>> print u'°'
°

If not, you either need to fix your configuration so Python does guess your terminal's encoding correctly (easy on most *nix systems, not so much on Windows…), or specify it manually, or just encode to the right encoding instead of the wrong one:

>>> print u'°'.encode('cp1252')
°
degreeChar = u'\N{DEGREE SIGN}'
print degreeChar

It should be fine as unicode ... at least on windows 7 this command works as expected

The document where ° is located is encoded with UTF-8, but the interpreter assumes it as different.

In my case I just added UTF-8 BOM mark to that document, so the interpreter become aware of the content encoding.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM