[英]Python unicode issue with subprocess.call
My parser function uses lxml
and provides me a list of unicode strings ( book_list
). 我的解析器函数使用
lxml
并提供了一个unicode字符串列表( book_list
)。
The strings are joined together into a file name, cleaned up and then passed via subprocess.call
to another binary which continues the work. 字符串连接在一起成为文件名,清理后通过
subprocess.call
传递给另一个继续工作的二进制文件。
My problem is that the unicode objects (eg title_name = u'Wunderlicher Traum von einem gro\\xdfen Narrennest'
) are encoded in ISO-8859-2 (at least that's what 'chardet' tells me) and I need to convert them to a format, which gets properly displayed on file system level. 我的问题是unicode对象(例如
title_name = u'Wunderlicher Traum von einem gro\\xdfen Narrennest'
)在ISO-8859-2中编码(至少那是'chardet'告诉我的),我需要将它们转换为格式,在文件系统级别上正确显示。 The current code results the file name to be u'Wunderlicher Traum von einem gro\\xc3\\x9fen Narrennest'
. 当前代码导致文件名为
u'Wunderlicher Traum von einem gro\\xc3\\x9fen Narrennest'
。
Does anyone have an idea what I'm doing wrong? 有谁知道我做错了什么?
Some infos: 一些信息:
sys.getdefaultencoding()
returns ascii
, which confuses me, since that theoretically shouldn't allow any special characters like äöü etc.). sys.getdefaultencoding()
返回ascii
,这让我感到困惑,因为理论上不应该允许任何特殊字符,如äöü等。 def convert_books(book_list, output_dir):
for book in book_list:
author_name = book[0][0]
title_name = book[0][1]
#print chardet.detect(title_name)
#print type(title_name)
#print title_name.decode('iso-8859-2')
year_name = "1337"
output_file = u"%s - %s (%s).pdf" % (author_name, title_name, year_name)
keep_characters = (' ', '.', '_')
output_file.join(c for c in output_file if c.isalnum() or c in keep_characters).rstrip()
path_to_out = "%s%s" % (output_dir, output_file)
target_file = WORK_DIR + book[1].replace(".xml", ".html")
engine_parameter = [
WKHTMLTOPDF_BIN,
# GENERAL
"-l", # lower quality
"-L", "25mm",
"-R", "25mm",
"-T", "25mm",
"-B", "35mm",
"--user-style-sheet", "media/style.css",
target_file,
path_to_out,
]
print "+ Creating PDF \"%s\"" % (output_file)
call(engine_parameter)
After writing down the question, the cause of the issued to be clear :) 写下问题后,发出的原因要明确:)
\\xdf
is UTF-8 \\xdf
是UTF-8 \\xc3\\x9f
is ISO-8859-1 or latin-1 \\xc3\\x9f
是ISO-8859-1或latin-1 All I had to do was convert the utf-8 objects to latin-1 objects and then pass the arguments to subprocess.call. 我所要做的就是将utf-8对象转换为latin-1对象,然后将参数传递给subprocess.call。
out_enc = 'latin-1'
engine_parameter = [arg.encode(out_enc) if isinstance(arg, unicode) else arg for arg in engine_parameter]
call(engine_parameter)
Hope this will save someone else the headache! 希望这会让别人头疼!
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.