简体   繁体   English

"Python:使用 OpenPyXL 模拟 CSV.DictReader"

[英]Python: Simulating CSV.DictReader with OpenPyXL

I have an Excel (.xlsx) file that I'm trying to parse, row by row.我有一个要逐行解析的 Excel (.xlsx) 文件。 I have a header (first row) that has a bunch of column titles like School, First Name, Last Name, Email, etc.我有一个标题(第一行),其中包含一堆列标题,例如学校、名字、姓氏、电子邮件等。

When I loop through each row, I want to be able to say something like:当我遍历每一行时,我希望能够说类似的话:

row['School']

and get back the value of the cell in the current row and the column with 'School' as its title.并取回当前行中单元格的值和以“学校”为标题的列。

I've looked through the OpenPyXL docs but can't seem to find anything terribly helpful.我浏览了 OpenPyXL 文档,但似乎找不到任何非常有用的东西。

Any suggestions?有什么建议?

I'm not incredibly familiar with OpenPyXL, but as far as I can tell it doesn't have any kind of dict reader/iterator helper.我对 OpenPyXL 并不十分熟悉,但据我所知,它没有任何类型的 dict 阅读器/迭代器助手。 However, it's fairly easy to iterate over the worksheet rows, as well as to create a dict from two lists of values.但是,迭代工作表行以及从两个值列表创建dict相当容易。

def iter_worksheet(worksheet):
    # It's necessary to get a reference to the generator, as 
    # `worksheet.rows` returns a new iterator on each access.
    rows = worksheet.rows

    # Get the header values as keys and move the iterator to the next item
    keys = [c.value for c in next(rows)]
    for row in rows:
        values = [c.value for c in row]
        yield dict(zip(keys, values))

Excel sheets are far more flexible than CSV files so it makes little sense to have something like DictReader. Excel 工作表比 CSV 文件灵活得多,因此拥有像 DictReader 这样的东西毫无意义。

Just create an auxiliary dictionary from the relevant column titles.只需从相关列标题创建一个辅助词典。

If you have columns like "School", "First Name", "Last Name", "EMail" you can create the dictionary like this.如果您有"School", "First Name", "Last Name", "EMail"您可以像这样创建字典。

keys = dict((value, idx) for (idx, value) in enumerate(values))
for row in ws.rows[1:]:
    school = row[keys['School'].value

I wrote DictReader based on openpyxl.我基于 openpyxl 编写了 DictReader。 Save the second listing to file 'excel.py' and use it as csv.DictReader.将第二个列表保存到文件 'excel.py' 并将其用作 csv.DictReader。 See usage example in the first listing.请参阅第一个清单中的用法示例。

with open('example01.xlsx', 'rb') as source_data:
    from excel import DictReader

    for row in DictReader(source_data, sheet_index=0):
        print(row)

excel.py: excel.py:

__all__ = ['DictReader']

from openpyxl import load_workbook
from openpyxl.cell import Cell

Cell.__init__.__defaults__ = (None, None, '', None)   # Change the default value for the Cell from None to `` the same way as in csv.DictReader


class DictReader(object):
    def __init__(self, f, sheet_index,
                 fieldnames=None, restkey=None, restval=None):
        self._fieldnames = fieldnames   # list of keys for the dict
        self.restkey  = restkey         # key to catch long rows
        self.restval  = restval         # default value for short rows
        self.reader   = load_workbook(f, data_only=True).worksheets[sheet_index].iter_rows(values_only=True)
        self.line_num = 0

    def __iter__(self):
        return self

    @property
    def fieldnames(self):
        if self._fieldnames is None:
            try:
                self._fieldnames = next(self.reader)
                self.line_num += 1
            except StopIteration:
                pass

        return self._fieldnames

    @fieldnames.setter
    def fieldnames(self, value):
        self._fieldnames = value

    def __next__(self):
        if self.line_num == 0:
            # Used only for its side effect.
            self.fieldnames

        row = next(self.reader)
        self.line_num += 1

        # unlike the basic reader, we prefer not to return blanks,
        # because we will typically wind up with a dict full of None
        # values
        while row == ():
            row = next(self.reader)

        d = dict(zip(self.fieldnames, row))
        lf = len(self.fieldnames)
        lr = len(row)

        if lf < lr:
            d[self.restkey] = row[lf:]
        elif lf > lr:
            for key in self.fieldnames[lr:]:
                d[key] = self.restval

        return d

The following seems to work for me.以下似乎对我有用。

    header = True
    headings = []
    for row in ws.rows:
        if header:
            for cell in row:
                headings.append(cell.value)
            header = False
            continue
        rowData = dict(zip(headings, row))
        wantedValue = rowData['myHeading'].value

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM