简体   繁体   中英

OpenPyXL - ReadOnly: How to skip empty rows without knowing when they occur?

i'm pretty new to programming so please bear with me if my code is not nice and the answer is too obvious. :)

I want to parse an excel file into a directory so i can later access them via key. I won't know how the excel file will be structured before parsing it. So I can't just code it that way to skip a certain empty row since they will be random. For this, i am using Python 3 and OpenPyXl (Read Only). This is my code:

from openpyxl import load_workbook
import pprint


# path to file
c = "test.xlsx"
wb = load_workbook(filename=c, read_only=True, data_only=True)

# key for directory
data = {}
# list of worksheet names
wsname = []
# values in rows per worksheet
valuename = []


# took this odd numbers since pprint organizes the numbers weird when 1s and 10s are involved
# counter for row
k = 9
# counter for column
i = 10

# splits name of xlsx - file from .xlsx
workbook = c.split(".")[0]

data[workbook] = {}
for ws in wb.worksheets:
    # takes worksheet name and parses it into the wsname list
    wsname.append(ws.title)
    wsrealname = wsname.pop()
    worksheet = wsrealname
    data[workbook][worksheet] = {}
    for row in ws.rows:
        k += 1
        for cell in row:
            # reads value per row and column
            data[workbook][worksheet]["Row: " + str(k) + " Column: " + str(i)] = cell.value
            i += 1
        i = 10
    k = 9

pprint.pprint(data)

And with this i get output like this:

    {'test': {'Worksheet1': {'Row: 10 Column: 10': None,
                             'Row: 10 Column: 11': None,
                             'Row: 10 Column: 12': None,
                             'Row: 10 Column: 13': None,
                             'Row: 11 Column: 10': None,
                             'Row: 11 Column: 11': 'Test1',
                             'Row: 11 Column: 12': None,
                             'Row: 11 Column: 13': None}}}

Which is the Output i want, despite the fact they i want to skip in this example the whole Row 10, since all values are None and therefore empty.

As mentioned, I don't know when empty rows will occur so I can't just hardcode a certain row to be skipped. In Read Only Mode, if you print(row) there will be just 'EmptyCell' in the row like this:

(<EmptyCell>, <EmptyCell>, <EmptyCell>, <EmptyCell>)

I tried to let my program check with set() whether there are duplicates in the row "values".

if len(set(row)) == 1:
.....

but that doesn't solve this issue, since I get this Error Message:

TypeError: unhashable type: 'ReadOnlyCell'

If I compare the cell.value with 'None' and exlude all 'Nones', I get this Output:

{'test': {'Worksheet1': {'Row: 11 Column: 11': 'Test1'}}}

which is not beneficial, since I just want just to skip cells if the whole row is empty. Output should be like that:

{'test': {'Worksheet1': {'Row: 11 Column: 10': None,
                     'Row: 11 Column: 11': 'Test1',
                     'Row: 11 Column: 12': None,
                     'Row: 11 Column: 13': None}}}

So, could you please help in figuring out how to skip cells only if the complete row (and therefore all cells) is empty?

Thanks a lot!

from openpyxl.cell.read_only import EmptyCell

for row in ws:
     empty = all(isinstance(cell, EmptyCell) for cell in row) # or check if the value is None

NB. in read-only mode avoid multiple calls like data[workbook][worksheet]['A1'] as they will force the library to parse the worsheet again and again

Just create your custom generator which would yield only not empty rows:

def iter_rows_with_data(worksheet):
    for row in worksheet.iter_rows(values_only=True):
        if any(row):
            yield row

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM