Python 正則表達式和 Excel

Question

我嘗試使用 openpyxl 對來自 excel 的數據執行常規匹配到 python 數組，但數據以 unicode 形式出現，“無”總是由 python 給出。 希伯來語中的數據，我想將字符串從 excel 轉換為可以使用正則表達式匹配的字符串.. 可以做什么？

import re
from openpyxl import load_workbook

file_name = 'excel.xlsx'
wb = load_workbook(file_name)
ws = wb[u'beta']
li = []
li2 = []
#readin the cells from excel into an array
for i in range(1,1500):
li2.append(ws["A"+str(i)].value)

for i in li2:
    if i != None:
    li.append(i)
#deliting the unwanted list for making memory
del li2

r = re.match("א",li[1])
r == None
>>> True

想要的結果是 r.string = "somthing..." 而不是 r == None

Python 2.7.9 (default, Dec 10 2014, 12:24:55) [MSC v.1500 32 bit (Intel)] on       win32
Type "copyright", "credits" or "license()" for more information.
 >>> ================================ RESTART ================================
 >>> 
 >>> li[1]
 u"\u05d0\u05d1\u05d5 \u05d2'\u05d5\u05d5\u05d9\u05d9\u05e2\u05d3 (\u05e9\u05d1\u05d8)"
 >>> print li[1]
 אבו ג'ווייעד (שבט)
 >>> r = re.match(u'א',li[1])
 >>> r ==None
 True
 >>> r = re.match(ur'א',li[1])
 >>> r = re.match(u'',li[1])
 >>> r.string
 u"\u05d0\u05d1\u05d5 \u05d2'\u05d5\u05d5\u05d9\u05d9\u05e2\u05d3      (\u05e9\u05d1\u05d8)"
 >>> unicode('א')

 Traceback (most recent call last):
   File "<pyshell#7>", line 1, in <module>
   unicode('א')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe0 in position 0:   ordinal not in range(128)
 >>> u'א'
 u'\xe0'
 >>> u'א'.encode("utf8")
 '\xc3\xa0'
 >>> u"א"
 u'\xe0' 
 >>>

Answer 1

我將代碼中指定的希伯來字母放入多個單元格中，然后運行以下代碼：

# -*- coding: utf-8 -*-
import re
from openpyxl import load_workbook

file_name = 'worksheet.xlsx'
wb = load_workbook(file_name)
ws = wb[u'beta']
li = []
li2 = []
#readin the cells from excel into an array
for i in range(1,1500):
    li2.append(ws["A"+str(i)].value)

for i in li2:
    if i != None:
        li.append(i)
#deliting the unwonted list for clearing memory
del li2

print "Non-empty cells: "
print li

r = re.search(u"א", li[1])

print "Match in: " 
print r.string.encode('utf-8')
print "Position: " 
print r.span()

輸出：

Non-empty cells:
[u'Hebrew letter test 1 \u05d0', u'Hebrew letter test 2 \u05d0', u'Hebrew letter test 3 \u05d0', u'Hebrew letter test 4 \u05d0']
Match in:
Hebrew letter test 2 ÎÉ
Position:
(21, 22)

如果這就是您所需要的，請告訴我。

Answer 2

答案是：

import re
from openpyxl import load_workbook
file_name = "excel.xlsx"
wb = load_workbook(file_name) 
ws = wb[wb.get_sheet_names()[0]]

#regex

match = re.search(r"\d",ws["A2"].value )
print match.group(0)

:)

Python 正則表達式和 Excel

問題描述

2 個解決方案

解決方案1
0 2015-11-24 09:14:42

解決方案2
-1 2016-10-18 13:34:20

Python 正則表達式和 Excel

問題描述

2 個解決方案

解決方案1 0 2015-11-24 09:14:42

解決方案2 -1 2016-10-18 13:34:20

解決方案1
0 2015-11-24 09:14:42

解決方案2
-1 2016-10-18 13:34:20