将unicode输入转换为字符串以进行比较

Question

我正在编写解析word文档表的代码，并与ascii字符串的关键字进行比较

tyring = unicode((ListTables[0].Rows[x])).encode('utf-8')
tryingstring = tyring.encode('ascii')
print 'trying string' ,tryingstring

错误如下：

tyring = unicode((ListTables[0].Rows[x])).encode('utf-8','ignore')
File "C:\Python27\lib\site-packages\win32com\client\dynamic.py", line 201, in __str__
    return str(self.__call__())
File "C:\Python27\lib\site-packages\win32com\client\dynamic.py", line 201, in __str__
    return str(self.__call__())
UnicodeEncodeError: 'ascii' codec can't encode character u'\uf07a' in position 0: ordinal not in range(128)

虽然现在尝试字符串是ascii字符串，但它不会打印它吗？

Answer 1

回到原来的帖子：

if tr1_find.search(str(ListTables[0].Cell(x,y))):
    print 'Found'
    value  = ListTables[0].Cell(x,y+1)

ListTables[0].Cell(x,y)从Word文档返回一个Cell实例。 在其上调用str()会检索其Unicode值，并尝试使用ascii编解码器将其编码为字节字符串。 由于它包含非ASCII字符，因此无法使用UnicodeEncodingError 。

在以后的编辑中：

tyring = unicode((ListTables[0].Rows[x])).encode('utf-8')
tryingstring = tyring.encode('ascii')
print 'trying string' ,tryingstring

unicode会检索Unicode值，将其转换为UTF-8字节字符串，并将其存储在tyring 。 下一行尝试再次将字节串编码为'ascii'。 这是无效的，因为只能对Unicode字符串进行编码，因此Python首先尝试使用默认的“ascii”编解码器将字节字符串转换回Unicode字符串。 这会导致UnicodeDecodingError （不是编码）。

最佳做法是使用Unicode进行所有字符串处理。 你缺少的是Range()方法来获取单元格的值。 以下是访问Word文档表的示例：

PythonWin 2.7.1 (r271:86832, Nov 27 2010, 18:30:46) [MSC v.1500 32 bit (Intel)] on win32.
Portions Copyright 1994-2008 Mark Hammond - see 'Help/About PythonWin' for further copyright information.
>>> import win32com.client
>>> word=win32com.client.gencache.EnsureDispatch('Word.Application')
>>> word.ActiveDocument.Tables[0].Cell(1,1).Range()
u'One\u4e00\r\x07'

请注意，它是一个Unicode字符串。 Word似乎也使用\\r\\x07作为细胞系终止子。

现在您可以测试该值：

>>> value = word.ActiveDocument.Tables[0].Cell(1,1).Range()
>>> value == 'One'   # NOTE: Python converts byte strings to Unicode via the default codec ('ascii' in Python 2.X)
False
>>> value == u'One'
False
>>> value == u'One马\r\x07'
False
>>> value == u'One一\r\x07'
True
>>> value == u'One\u4e00\r\x07'
True
>>> value == 'One\x92' # non-ASCII byte string fails to convert
__main__:1: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
False

Answer 2

转换为Unicode字符串并在转换后使用encode() ：

if tr1_find.search(unicode(ListTables[0].Cell(x,y)).encode('utf-8')):

Answer 3

试试这个，我想知道它是否有帮助：

if tr1_find.search(unicode(ListTables[0].Cell(x,y)).encode('utf-8','ignore')):

您可能还会从Python的文档中找到有用的页面： http ： //docs.python.org/howto/unicode.html

它涵盖了这种确切的问题。

Answer 4

你用codecs.open()打开文件了吗？ 您可以在该函数中指定文件编码。

http://docs.python.org/library/codecs.html

将unicode输入转换为字符串以进行比较

问题描述

4 个解决方案

解决方案1
2 已采纳 2011-06-24 14:21:50

解决方案2
0 2011-06-23 17:48:14

解决方案3
0 2011-06-23 19:24:08

解决方案4
0 2011-06-24 13:08:24

将unicode输入转换为字符串以进行比较

问题描述

4 个解决方案

解决方案1 2 已采纳 2011-06-24 14:21:50

解决方案2 0 2011-06-23 17:48:14

解决方案3 0 2011-06-23 19:24:08

解决方案4 0 2011-06-24 13:08:24

解决方案1
2 已采纳 2011-06-24 14:21:50

解决方案2
0 2011-06-23 17:48:14

解决方案3
0 2011-06-23 19:24:08

解决方案4
0 2011-06-24 13:08:24