將stdout重定向到具有unicode編碼的文件，同時將Windows eol保留在python 2中

Question

我在這里撞牆。 我需要將所有輸出重定向到一個文件，但是我需要將此文件編碼為utf-8。 問題是在使用codecs.open ：

# errLog = io.open(os.path.join(os.getcwdu(),u'BashBugDump.log'), 'w',
#                  encoding='utf-8')
errLog = codecs.open(os.path.join(os.getcwdu(), u'BashBugDump.log'),
                     'w', encoding='utf-8')
sys.stdout = errLog
sys.stderr = errLog

編解碼器以二進制模式打開文件，從而產生\\n行終止符。 我嘗試使用io.open但這不能與在整個代碼庫中使用的print語句一起使用（請參閱Python 2.7：print不會對io模塊說unicode嗎？或python：TypeError：無法將str寫入文本流）

我不是有這個問題，例如看到只有一個在這里，但他們采用的解決方案是專門針對我們不使用記錄模塊。

另請參見這無法修復python中的錯誤： https ： //bugs.python.org/issue2131

那么在python2中執行此操作的正確方法是什么？

Answer 1

選項1

重定向是一個shell操作。 您根本不需要更改Python代碼，但是必須告訴Python如果重定向了要使用哪種編碼。 這是通過環境變量完成的。 以下代碼將stdout和stderr都重定向到UTF-8編碼的文件：

test.bat的

set PYTHONIOENCODING=utf8
python test.py >out.txt 2>&1

test.py

#coding:utf8
import sys
print u"我不喜歡你女朋友！"
print >>sys.stderr, u"你需要一個新的。"

out.txt（以UTF-8編碼）

我不喜歡你女朋友！
你需要一個新的。

out.txt的十六進制轉儲

0000: E6 88 91 E4 B8 8D E5 96 9C E6 AC A2 E4 BD A0 E5
0010: A5 B3 E6 9C 8B E5 8F 8B EF BC 81 0D 0A E4 BD A0 
0020: E9 9C 80 E8 A6 81 E4 B8 80 E4 B8 AA E6 96 B0 E7
0030: 9A 84 E3 80 82 0D 0A

注意：您確實需要打印Unicode字符串才能起作用。 打印字節字符串，您將得到打印的字節。

選項2

codecs.open可能會強制采用二進制模式，但codecs.getwriter不會。 給它一個以文本模式打開的文件：

#coding:utf8
import sys
import codecs
sys.stdout = sys.stderr = codecs.getwriter('utf8')(open('out.txt','w'))
print u"我不喜歡你女朋友！"
print >>sys.stderr, u"你需要一個新的。"

（與上面相同的輸出和十六進制轉儲）

Answer 2

看來python 2版本的io不能很好地與print語句配合使用，但是如果您使用print函數，它將可以正常工作。

演示：

from __future__ import print_function
import sys
import io

errLog = io.open('test.log', mode='wt', buffering=1, encoding='utf-8', newline='\r\n')
sys.stdout = errLog

print(u'This is a ™ test')
print(u'Another © line')

“ test.log”的內容

This is a ™ test
Another © line

“ test.log”的十六進制轉儲

00000000  54 68 69 73 20 69 73 20  61 20 e2 84 a2 20 74 65  |This is a ... te|
00000010  73 74 0d 0a 41 6e 6f 74  68 65 72 20 c2 a9 20 6c  |st..Another .. l|
00000020  69 6e 65 0d 0a                                    |ine..|
00000025

我在Linux（YMMV）的Python 2.6上運行了此代碼。

如果您不想使用print功能，則可以實現自己的類似於文件的編碼類。

import sys

class Encoder(object):
    def __init__(self, fname):
        self.file = open(fname, 'wb')

    def write(self, s):
        self.file.write(s.replace('\n', '\r\n').encode('utf-8'))

errlog = Encoder('test.log')
sys.stdout = errlog
sys.stderr = errlog

print 'hello\nthere'
print >>sys.stderr, u'This is a ™ test'
print u'Another © line'
print >>sys.stderr, 1, 2, 3, 4
print 5, 6, 7, 8

“ test.log”的內容

hello
there
This is a ™ test
Another © line
1 2 3 4
5 6 7 8

“ test.log”的十六進制轉儲

00000000  68 65 6c 6c 6f 0d 0a 74  68 65 72 65 0d 0a 54 68  |hello..there..Th|
00000010  69 73 20 69 73 20 61 20  e2 84 a2 20 74 65 73 74  |is is a ... test|
00000020  0d 0a 41 6e 6f 74 68 65  72 20 c2 a9 20 6c 69 6e  |..Another .. lin|
00000030  65 0d 0a 31 20 32 20 33  20 34 0d 0a 35 20 36 20  |e..1 2 3 4..5 6 |
00000040  37 20 38 0d 0a                                    |7 8..|
00000045

請記住，這只是一個快速演示。 您可能需要一種更復雜的方式來處理換行符，例如，如果\\n已經在\\r之前，則可能不想替換它。 OTOH，使用普通的Python文本處理不應該成為問題...

這是結合了先前兩種策略的另一個版本。 我不知道它是否比第二個版本快。

import sys
import io

class Encoder(object):
    def __init__(self, fname):
        self.file = io.open(fname, mode='wt', encoding='utf-8', newline='\r\n')

    def write(self, s):
        self.file.write(unicode(s))

errlog = Encoder('test.log')
sys.stdout = errlog
sys.stderr = errlog

print 'hello\nthere'
print >>sys.stderr, u'This is a ™ test'
print u'Another © line'
print >>sys.stderr, 1, 2, 3, 4
print 5, 6, 7, 8

這將產生與先前版本相同的輸出。

將stdout重定向到具有unicode編碼的文件，同時將Windows eol保留在python 2中

問題描述

2 個解決方案

解決方案1
4 已采納 2016-12-06 05:01:39

選項1

test.bat的

test.py

out.txt（以UTF-8編碼）

out.txt的十六進制轉儲

選項2

解決方案2
1 2016-12-05 08:20:01

將stdout重定向到具有unicode編碼的文件，同時將Windows eol保留在python 2中

問題描述

2 個解決方案

解決方案1 4 已采納 2016-12-06 05:01:39

選項1

test.bat的

test.py

out.txt（以UTF-8編碼）

out.txt的十六進制轉儲

選項2

解決方案2 1 2016-12-05 08:20:01

解決方案1
4 已采納 2016-12-06 05:01:39

解決方案2
1 2016-12-05 08:20:01