[英]Python: Simulating CSV.DictReader with OpenPyXL
我有一個要逐行解析的 Excel (.xlsx) 文件。 我有一個標題(第一行),其中包含一堆列標題,例如學校、名字、姓氏、電子郵件等。
當我遍歷每一行時,我希望能夠說類似的話:
row['School']
並取回當前行中單元格的值和以“學校”為標題的列。
我瀏覽了 OpenPyXL 文檔,但似乎找不到任何非常有用的東西。
有什么建議?
我對 OpenPyXL 並不十分熟悉,但據我所知,它沒有任何類型的 dict 閱讀器/迭代器助手。 但是,迭代工作表行以及從兩個值列表創建dict
相當容易。
def iter_worksheet(worksheet):
# It's necessary to get a reference to the generator, as
# `worksheet.rows` returns a new iterator on each access.
rows = worksheet.rows
# Get the header values as keys and move the iterator to the next item
keys = [c.value for c in next(rows)]
for row in rows:
values = [c.value for c in row]
yield dict(zip(keys, values))
Excel 工作表比 CSV 文件靈活得多,因此擁有像 DictReader 這樣的東西毫無意義。
只需從相關列標題創建一個輔助詞典。
如果您有"School", "First Name", "Last Name", "EMail"
您可以像這樣創建字典。
keys = dict((value, idx) for (idx, value) in enumerate(values))
for row in ws.rows[1:]:
school = row[keys['School'].value
我基於 openpyxl 編寫了 DictReader。 將第二個列表保存到文件 'excel.py' 並將其用作 csv.DictReader。 請參閱第一個清單中的用法示例。
with open('example01.xlsx', 'rb') as source_data:
from excel import DictReader
for row in DictReader(source_data, sheet_index=0):
print(row)
excel.py:
__all__ = ['DictReader']
from openpyxl import load_workbook
from openpyxl.cell import Cell
Cell.__init__.__defaults__ = (None, None, '', None) # Change the default value for the Cell from None to `` the same way as in csv.DictReader
class DictReader(object):
def __init__(self, f, sheet_index,
fieldnames=None, restkey=None, restval=None):
self._fieldnames = fieldnames # list of keys for the dict
self.restkey = restkey # key to catch long rows
self.restval = restval # default value for short rows
self.reader = load_workbook(f, data_only=True).worksheets[sheet_index].iter_rows(values_only=True)
self.line_num = 0
def __iter__(self):
return self
@property
def fieldnames(self):
if self._fieldnames is None:
try:
self._fieldnames = next(self.reader)
self.line_num += 1
except StopIteration:
pass
return self._fieldnames
@fieldnames.setter
def fieldnames(self, value):
self._fieldnames = value
def __next__(self):
if self.line_num == 0:
# Used only for its side effect.
self.fieldnames
row = next(self.reader)
self.line_num += 1
# unlike the basic reader, we prefer not to return blanks,
# because we will typically wind up with a dict full of None
# values
while row == ():
row = next(self.reader)
d = dict(zip(self.fieldnames, row))
lf = len(self.fieldnames)
lr = len(row)
if lf < lr:
d[self.restkey] = row[lf:]
elif lf > lr:
for key in self.fieldnames[lr:]:
d[key] = self.restval
return d
以下似乎對我有用。
header = True
headings = []
for row in ws.rows:
if header:
for cell in row:
headings.append(cell.value)
header = False
continue
rowData = dict(zip(headings, row))
wantedValue = rowData['myHeading'].value
我遇到了與上述相同的問題。 因此,我創建了一個名為openpyxl-dictreader<\/a>的簡單擴展,可以通過 pip 安裝。 這與@viktor 在此線程中早些時候提出的建議非常相似。
該包主要基於 Python 原生 csv.DictReader 類的源代碼。 它允許您使用 openpyxl 根據列名選擇項目。 例如:
import openpyxl_dictreader
reader = openpyxl_dictreader.DictReader("names.xlsx", "Sheet1")
for row in reader:
print(row["First Name"], row["Last Name"])
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.