i'm pretty new to programming so please bear with me if my code is not nice and the answer is too obvious. :)
I want to parse an excel file into a directory so i can later access them via key. I won't know how the excel file will be structured before parsing it. So I can't just code it that way to skip a certain empty row since they will be random. For this, i am using Python 3 and OpenPyXl (Read Only). This is my code:
from openpyxl import load_workbook
import pprint
# path to file
c = "test.xlsx"
wb = load_workbook(filename=c, read_only=True, data_only=True)
# key for directory
data = {}
# list of worksheet names
wsname = []
# values in rows per worksheet
valuename = []
# took this odd numbers since pprint organizes the numbers weird when 1s and 10s are involved
# counter for row
k = 9
# counter for column
i = 10
# splits name of xlsx - file from .xlsx
workbook = c.split(".")[0]
data[workbook] = {}
for ws in wb.worksheets:
# takes worksheet name and parses it into the wsname list
wsname.append(ws.title)
wsrealname = wsname.pop()
worksheet = wsrealname
data[workbook][worksheet] = {}
for row in ws.rows:
k += 1
for cell in row:
# reads value per row and column
data[workbook][worksheet]["Row: " + str(k) + " Column: " + str(i)] = cell.value
i += 1
i = 10
k = 9
pprint.pprint(data)
And with this i get output like this:
{'test': {'Worksheet1': {'Row: 10 Column: 10': None,
'Row: 10 Column: 11': None,
'Row: 10 Column: 12': None,
'Row: 10 Column: 13': None,
'Row: 11 Column: 10': None,
'Row: 11 Column: 11': 'Test1',
'Row: 11 Column: 12': None,
'Row: 11 Column: 13': None}}}
Which is the Output i want, despite the fact they i want to skip in this example the whole Row 10, since all values are None and therefore empty.
As mentioned, I don't know when empty rows will occur so I can't just hardcode a certain row to be skipped. In Read Only Mode, if you print(row) there will be just 'EmptyCell' in the row like this:
(<EmptyCell>, <EmptyCell>, <EmptyCell>, <EmptyCell>)
I tried to let my program check with set() whether there are duplicates in the row "values".
if len(set(row)) == 1:
.....
but that doesn't solve this issue, since I get this Error Message:
TypeError: unhashable type: 'ReadOnlyCell'
If I compare the cell.value with 'None' and exlude all 'Nones', I get this Output:
{'test': {'Worksheet1': {'Row: 11 Column: 11': 'Test1'}}}
which is not beneficial, since I just want just to skip cells if the whole row is empty. Output should be like that:
{'test': {'Worksheet1': {'Row: 11 Column: 10': None,
'Row: 11 Column: 11': 'Test1',
'Row: 11 Column: 12': None,
'Row: 11 Column: 13': None}}}
So, could you please help in figuring out how to skip cells only if the complete row (and therefore all cells) is empty?
Thanks a lot!
from openpyxl.cell.read_only import EmptyCell
for row in ws:
empty = all(isinstance(cell, EmptyCell) for cell in row) # or check if the value is None
NB. in read-only mode avoid multiple calls like data[workbook][worksheet]['A1']
as they will force the library to parse the worsheet again and again
Just create your custom generator which would yield only not empty rows:
def iter_rows_with_data(worksheet):
for row in worksheet.iter_rows(values_only=True):
if any(row):
yield row
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.