简体   繁体   中英

python 3.0, how to make print() output unicode?

I'm working in WinXP 5.1.2600, writing a Python application involving Chinese pinyin, which has involved me in endless Unicode problems. Switching to Python 3.0 has solved many of them. But the print() function for console output is not Unicode-aware for some odd reason. Here's a teeny program.

print('sys.stdout encoding is "' + sys.stdout.encoding + '"')
str1 = 'lüelā'
print(str1)

Output is (changing angle brackets to square brackets for readability):

sys.stdout encoding is "cp1252"
    Traceback (most recent call last):
      File "TestPrintEncoding.py", line 22, in [module]
        print(str1)
      File "C:\Python30\lib\io.py", line 1491, in write
        b = encoder.encode(s)
      File "C:\Python30\lib\encodings\cp1252.py", line 19, in encode
        return codecs.charmap_encode(input,self.errors,encoding_table)[0]
    UnicodeEncodeError: 'charmap' codec can't encode character '\u0101' 
    in position 4: character maps to [undefined]

Note that ü = \\xfc = 252 gives no problem since it's upper ASCII. But ā = \ā is beyond 8-bits.

Anyone have an idea how to change the encoding of sys.stdout to 'utf-8'? Bear in mind that Python 3.0 no longer uses the codecs module, if I understand the documentation right.


Apologies, I gave you the program without the preamble. Before the 3 lines given, it starts like this:

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import sys

Unfortunately, the coding specified by the "coding:" line is the coding of the source code , not of the console output. But thank you for your thoughts!

The Windows command prompt (cmd.exe) cannot display the Unicode characters you are using, even though Python is handling it in a correct manner internally. You need to use IDLE, Cygwin, or another program that can display Unicode correctly.

See this thread for a full explanation: http://www.nabble.com/unable-to-print-Unicode-characters-in-Python-3-td21670662.html

You may want to try changing the environment variable "PYTHONIOENCODING" to "utf_8." I have written a page on my ordeal with this problem .

Check out the question and answer here , I think they have some valuable clues. Specifically, note the setdefaultencoding in the sys module, but also the fact that you probably shouldn't use it.

The problem of displaying Unicode charaters in Python in Windows is known. There is no official solution yet. The right thing to do is to use winapi function WriteConsoleW. It is nontrivial to build a working solution as there are other related issues. However, I have developed a package which tries to fix Python regarding this issue. See https://github.com/Drekin/win-unicode-console . You can also read there a deeper explanation of the problem. The package is also on pypi ( https://pypi.python.org/pypi/win_unicode_console ) and can be installed using pip.

Here's a dirty hack:

# works
import os
os.system("chcp 65001 &")
print("юникод")

However everything breaks it:

  • simple muting first line already breaks it:

     # doesn't work import os os.system("chcp 65001 >nul &") print("юникод") 
  • checking for OS type breaks it:

     # doesn't work import os if os.name == "nt": os.system("chcp 65001 &") print("юникод") 
  • it doesn't even works under if block:

     # doesn't work import os if os.name == "nt": os.system("chcp 65001 &") print("юникод") 

But one can print with cmd's echo:

# works
import os
os.system("chcp 65001 & echo {0}".format("юникод"))

and here's a simple way to make this cross-platform:

# works

import os

def simple_cross_platrofm_print(obj):
    if os.name == "nt":
        os.system("chcp 65001 >nul & echo {0}".format(obj))
    else:
        print(obj)

simple_cross_platrofm_print("юникод")

but the window's echo trailing empty line can't be suppressed.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM