"Python：使用 OpenPyXL 模拟 CSV.DictReader"

Question

I have an Excel (.xlsx) file that I'm trying to parse, row by row.我有一个要逐行解析的 Excel (.xlsx) 文件。 I have a header (first row) that has a bunch of column titles like School, First Name, Last Name, Email, etc.我有一个标题（第一行），其中包含一堆列标题，例如学校、名字、姓氏、电子邮件等。

When I loop through each row, I want to be able to say something like:当我遍历每一行时，我希望能够说类似的话：

row['School']

and get back the value of the cell in the current row and the column with 'School' as its title.并取回当前行中单元格的值和以“学校”为标题的列。

I've looked through the OpenPyXL docs but can't seem to find anything terribly helpful.我浏览了 OpenPyXL 文档，但似乎找不到任何非常有用的东西。

Any suggestions?有什么建议？

Answer 1

I'm not incredibly familiar with OpenPyXL, but as far as I can tell it doesn't have any kind of dict reader/iterator helper.我对 OpenPyXL 并不十分熟悉，但据我所知，它没有任何类型的 dict 阅读器/迭代器助手。 However, it's fairly easy to iterate over the worksheet rows, as well as to create a dict from two lists of values.但是，迭代工作表行以及从两个值列表创建dict相当容易。

def iter_worksheet(worksheet):
    # It's necessary to get a reference to the generator, as 
    # `worksheet.rows` returns a new iterator on each access.
    rows = worksheet.rows

    # Get the header values as keys and move the iterator to the next item
    keys = [c.value for c in next(rows)]
    for row in rows:
        values = [c.value for c in row]
        yield dict(zip(keys, values))

Answer 2

Excel sheets are far more flexible than CSV files so it makes little sense to have something like DictReader. Excel 工作表比 CSV 文件灵活得多，因此拥有像 DictReader 这样的东西毫无意义。

Just create an auxiliary dictionary from the relevant column titles.只需从相关列标题创建一个辅助词典。

If you have columns like "School", "First Name", "Last Name", "EMail" you can create the dictionary like this.如果您有"School", "First Name", "Last Name", "EMail"您可以像这样创建字典。

keys = dict((value, idx) for (idx, value) in enumerate(values))
for row in ws.rows[1:]:
    school = row[keys['School'].value

Answer 3

I wrote DictReader based on openpyxl.我基于 openpyxl 编写了 DictReader。 Save the second listing to file 'excel.py' and use it as csv.DictReader.将第二个列表保存到文件 'excel.py' 并将其用作 csv.DictReader。 See usage example in the first listing.请参阅第一个清单中的用法示例。

with open('example01.xlsx', 'rb') as source_data:
    from excel import DictReader

    for row in DictReader(source_data, sheet_index=0):
        print(row)

excel.py: excel.py:

__all__ = ['DictReader']

from openpyxl import load_workbook
from openpyxl.cell import Cell

Cell.__init__.__defaults__ = (None, None, '', None)   # Change the default value for the Cell from None to `` the same way as in csv.DictReader


class DictReader(object):
    def __init__(self, f, sheet_index,
                 fieldnames=None, restkey=None, restval=None):
        self._fieldnames = fieldnames   # list of keys for the dict
        self.restkey  = restkey         # key to catch long rows
        self.restval  = restval         # default value for short rows
        self.reader   = load_workbook(f, data_only=True).worksheets[sheet_index].iter_rows(values_only=True)
        self.line_num = 0

    def __iter__(self):
        return self

    @property
    def fieldnames(self):
        if self._fieldnames is None:
            try:
                self._fieldnames = next(self.reader)
                self.line_num += 1
            except StopIteration:
                pass

        return self._fieldnames

    @fieldnames.setter
    def fieldnames(self, value):
        self._fieldnames = value

    def __next__(self):
        if self.line_num == 0:
            # Used only for its side effect.
            self.fieldnames

        row = next(self.reader)
        self.line_num += 1

        # unlike the basic reader, we prefer not to return blanks,
        # because we will typically wind up with a dict full of None
        # values
        while row == ():
            row = next(self.reader)

        d = dict(zip(self.fieldnames, row))
        lf = len(self.fieldnames)
        lr = len(row)

        if lf < lr:
            d[self.restkey] = row[lf:]
        elif lf > lr:
            for key in self.fieldnames[lr:]:
                d[key] = self.restval

        return d

Answer 4

The following seems to work for me.以下似乎对我有用。

    header = True
    headings = []
    for row in ws.rows:
        if header:
            for cell in row:
                headings.append(cell.value)
            header = False
            continue
        rowData = dict(zip(headings, row))
        wantedValue = rowData['myHeading'].value

Answer 5

I was running into the same issue as described above.我遇到了与上述相同的问题。 Therefore I created a simple extension called openpyxl-dictreader<\/a> that can be installed through pip.因此，我创建了一个名为openpyxl-dictreader<\/a>的简单扩展，可以通过 pip 安装。 It is very similar to the suggestion made by @viktor earlier in this thread.这与@viktor 在此线程中早些时候提出的建议非常相似。

The package is largely based on source code of Python's native csv.DictReader class.该包主要基于 Python 原生 csv.DictReader 类的源代码。 It allows you to select items based on column names using openpyxl.它允许您使用 openpyxl 根据列名选择项目。 For example:例如：

import openpyxl_dictreader

reader = openpyxl_dictreader.DictReader("names.xlsx", "Sheet1")
for row in reader:
    print(row["First Name"], row["Last Name"])

"Python：使用 OpenPyXL 模拟 CSV.DictReader"

问题描述

5 个解决方案

解决方案1
1 2018-12-19 21:08:22

解决方案2
0 2016-06-25 12:16:34

解决方案3
0 2019-11-24 19:31:48

解决方案4
0 2020-10-15 05:34:56

解决方案5
0 2022-01-26 06:52:23

"Python：使用 OpenPyXL 模拟 CSV.DictReader"

问题描述

5 个解决方案

解决方案1 1 2018-12-19 21:08:22

解决方案2 0 2016-06-25 12:16:34

解决方案3 0 2019-11-24 19:31:48

解决方案4 0 2020-10-15 05:34:56

解决方案5 0 2022-01-26 06:52:23

解决方案1
1 2018-12-19 21:08:22

解决方案2
0 2016-06-25 12:16:34

解决方案3
0 2019-11-24 19:31:48

解决方案4
0 2020-10-15 05:34:56

解决方案5
0 2022-01-26 06:52:23