简体   繁体   English

Python:带有codecs.open的UnicodeEncodeError

[英]Python: UnicodeEncodeError with codecs.open

I'm trying to work with orgnode.py ( from here ) to parse org files. 我正在尝试使用orgnode.py( 从这里开始 )来解析org文件。 These files are English/Persian and using file -i it seems they are utf-8 encoded. 这些文件是英语/波斯语,使用file -i似乎是utf-8编码的。 But I recieve this error when use makelist function (which itself uses codec.open with utf-8): 但是我在使用makelist函数(其本身使用带有utf-8的codec.open)时收到此错误:

>>> Orgnode.makelist("toread.org")
[**  [[http://www.apa.org/helpcenter/sexual-orientation.aspx][Sexual orientation, homosexuality and bisexuality]]            :ToRead:



Added:[2013-11-06 Wed]
, **  [[http://stackoverflow.com/questions/11384516/how-to-make-all-org-files-under-a-folder-added-in-agenda-list-automatically][emacs - How to make all org-files under a folder added in agenda-list automatically? - Stack Overflow]] 

(setq org-agenda-text-search-extra-files '(agenda-archives "~/org/subdir/textfile1.txt" "~/org/subdir/textfile1.txt"))
Added:[2013-07-23 Tue] 
, Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode characters in position 63-66: ordinal not in range(128)

The function returns a list of org headings, but instead of last item (which is written in Persian) it shows the error. 该函数返回组织标题列表,但显示最后一个错误(而不是用波斯语写的最后一项)。 Any suggestion how can I deal with this error? 有什么建议我该如何处理这个错误?

As the traceback tells you, the exception is raised by the statement you input on the Python console itself ( Orgnode.makelist("toread.org") ), and not in one of the functions called during the evaluation of the statement. 如回溯告诉您的那样,异常是由您在Python控制台本身( Orgnode.makelist("toread.org") )上输入的语句引发的,而不是在评估语句期间调用的函数之一中Orgnode.makelist("toread.org")的。

This is typical of encoding errors when the interpreter automatically converts the return value of the statement to display it back on the console. 当解释器自动转换语句的返回值以将其显示回控制台时,这是典型的编码错误。 The text displayed is the result of applying the repr() builtin to the return value. 显示的文本是将内置的repr()应用于返回值的结果。

Here the repr() of the result of makelist is a unicode object, which the interpreter tries to convert to str using the "ascii" codec by default. 这里, makelist结果的repr()是一个unicode对象,解释器默认情况下会尝试使用"ascii"编解码器将其转换为str

The culprit is the Orgnode.__repr__ method ( https://github.com/albins/orgnode/blob/master/Orgnode.py#L592 ) which return a unicode object (because node content has automatically been decoded with codecs.open ), although __repr__ methods are usually expected to return strings with only safe (ASCII) characters. 罪魁祸首是Orgnode.__repr__方法( https://github.com/albins/orgnode/blob/master/Orgnode.py#L592 ),该方法返回unicode对象(因为节点内容已通过codecs.open自动解码),尽管通常要求__repr__方法返回仅包含安全(ASCII)字符的字符串。

Here is the smallest change you can do to Orgnode as a workaround for your problem: 这是您可以对Orgnode进行的最小更改,以解决您的问题:

-- a/Orgnode.py
+++ b/Orgnode.py
@@ -612,4 +612,4 @@ class Orgnode(object):
 # following will output the text used to construct the object
         n = n + "\n" + self.body

-        return n
+        return n.encode('utf-8')

If you want a version which only returns ASCII characters, you can use 'string-escape' as the codec instead of 'utf-8' . 如果要使用仅返回ASCII字符的版本,则可以使用'string-escape'作为编解码器,而不是'utf-8'

This is only a quick and dirty fix. 这只是一个快速而肮脏的修复程序。 The right solution would be to rewrite a proper __repr__ method, and also add the __str__ and __unicode__ methods that this class lacks. 正确的解决方案是重写适当的__repr__方法,并添加此类缺少的__str____unicode__方法。 (I might even fix this myself if I find the time, as I am quite interested in using Python code to manipulate my Org-mode files) (如果有时间的话,我什至可以自己解决这个问题,因为我对使用Python代码操作Org模式文件非常感兴趣)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM