將unicode輸入轉換為字符串以進行比較

Question

我正在編寫解析word文檔表的代碼，並與ascii字符串的關鍵字進行比較

tyring = unicode((ListTables[0].Rows[x])).encode('utf-8')
tryingstring = tyring.encode('ascii')
print 'trying string' ,tryingstring

錯誤如下：

tyring = unicode((ListTables[0].Rows[x])).encode('utf-8','ignore')
File "C:\Python27\lib\site-packages\win32com\client\dynamic.py", line 201, in __str__
    return str(self.__call__())
File "C:\Python27\lib\site-packages\win32com\client\dynamic.py", line 201, in __str__
    return str(self.__call__())
UnicodeEncodeError: 'ascii' codec can't encode character u'\uf07a' in position 0: ordinal not in range(128)

雖然現在嘗試字符串是ascii字符串，但它不會打印它嗎？

Answer 1

回到原來的帖子：

if tr1_find.search(str(ListTables[0].Cell(x,y))):
    print 'Found'
    value  = ListTables[0].Cell(x,y+1)

ListTables[0].Cell(x,y)從Word文檔返回一個Cell實例。 在其上調用str()會檢索其Unicode值，並嘗試使用ascii編解碼器將其編碼為字節字符串。 由於它包含非ASCII字符，因此無法使用UnicodeEncodingError 。

在以后的編輯中：

tyring = unicode((ListTables[0].Rows[x])).encode('utf-8')
tryingstring = tyring.encode('ascii')
print 'trying string' ,tryingstring

unicode會檢索Unicode值，將其轉換為UTF-8字節字符串，並將其存儲在tyring 。 下一行嘗試再次將字節串編碼為'ascii'。 這是無效的，因為只能對Unicode字符串進行編碼，因此Python首先嘗試使用默認的“ascii”編解碼器將字節字符串轉換回Unicode字符串。 這會導致UnicodeDecodingError （不是編碼）。

最佳做法是使用Unicode進行所有字符串處理。 你缺少的是Range()方法來獲取單元格的值。 以下是訪問Word文檔表的示例：

PythonWin 2.7.1 (r271:86832, Nov 27 2010, 18:30:46) [MSC v.1500 32 bit (Intel)] on win32.
Portions Copyright 1994-2008 Mark Hammond - see 'Help/About PythonWin' for further copyright information.
>>> import win32com.client
>>> word=win32com.client.gencache.EnsureDispatch('Word.Application')
>>> word.ActiveDocument.Tables[0].Cell(1,1).Range()
u'One\u4e00\r\x07'

請注意，它是一個Unicode字符串。 Word似乎也使用\\r\\x07作為細胞系終止子。

現在您可以測試該值：

>>> value = word.ActiveDocument.Tables[0].Cell(1,1).Range()
>>> value == 'One'   # NOTE: Python converts byte strings to Unicode via the default codec ('ascii' in Python 2.X)
False
>>> value == u'One'
False
>>> value == u'One馬\r\x07'
False
>>> value == u'One一\r\x07'
True
>>> value == u'One\u4e00\r\x07'
True
>>> value == 'One\x92' # non-ASCII byte string fails to convert
__main__:1: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
False

Answer 2

轉換為Unicode字符串並在轉換后使用encode() ：

if tr1_find.search(unicode(ListTables[0].Cell(x,y)).encode('utf-8')):

Answer 3

試試這個，我想知道它是否有幫助：

if tr1_find.search(unicode(ListTables[0].Cell(x,y)).encode('utf-8','ignore')):

您可能還會從Python的文檔中找到有用的頁面： http ： //docs.python.org/howto/unicode.html

它涵蓋了這種確切的問題。

Answer 4

你用codecs.open()打開文件了嗎？ 您可以在該函數中指定文件編碼。

http://docs.python.org/library/codecs.html

將unicode輸入轉換為字符串以進行比較

問題描述

4 個解決方案

解決方案1
2 已采納 2011-06-24 14:21:50

解決方案2
0 2011-06-23 17:48:14

解決方案3
0 2011-06-23 19:24:08

解決方案4
0 2011-06-24 13:08:24

將unicode輸入轉換為字符串以進行比較

問題描述

4 個解決方案

解決方案1 2 已采納 2011-06-24 14:21:50

解決方案2 0 2011-06-23 17:48:14

解決方案3 0 2011-06-23 19:24:08

解決方案4 0 2011-06-24 13:08:24

解決方案1
2 已采納 2011-06-24 14:21:50

解決方案2
0 2011-06-23 17:48:14

解決方案3
0 2011-06-23 19:24:08

解決方案4
0 2011-06-24 13:08:24