How to enumerate all cells in an Excel sheet with a (non-empty) value using openpyxl?

Question

I need to analyze the contents of Excel sheets (.xslx) programatically using openpyxl .

My problem is that some excel sheets have a very large number of empty rows, eg the 3. Access Control worksheet below has 1048534 rows and 16384 columns, although only the first 100 rows are non-empty. This results in billions of cells, which is simply not feasible to enumerate over, eg:

for i in range(ws.max_row + 1):
   for j in range(ws.max_column + 1):
      value = ws.cell(i, j)
      ...

I want to avoid spilling information, eg skip a non-empty row exists at index 10000. Thus a solution to only analyze the first N rows simply doesn't work for my use case.

Does openpyxl support a way to filter all these superfluous rows, and only return a set of rows containing at least one non-empty cell?

>>> for ws in wb.worksheets:
...     print(ws.title, ws.max_row, ws.max_column)
...
0. Intro 16 7
1. Solution Profile 63 8
2. Solution Diagram 6 4
1.BE 18 29
3. Access Control 1048534 16384.    <-- PROBLEM
4. Systems Development 46 21
5. Operational Security 43 14
6. Data Communication 11 14
7. Other 27 16
3-7.BE 6 3
8. Risk Assessment 15 29
8.BE 20 16
9. Conclusions 36 8
EBA Risk Statement 51 13
Vulnerabilities 86 5
Threats 44 4
Sample IT Risks 167 6
9.BE 5 1
Risk matrix 19 11
9. Document change log 28 6
>>> (1048534 * 16384) / 1000000
17179.181056

Answer 1

It depends what you're doing. You could load each sheet into a Pandas Dataframe which has methods for dealing with this. Otherwise the following should help:

for row in ws.iter_rows(values_only=True):
    if not any(row):
       continue

How to enumerate all cells in an Excel sheet with a (non-empty) value using openpyxl?

Question

1 answers

solution1
0 2020-10-30 11:03:21

How to enumerate all cells in an Excel sheet with a (non-empty) value using openpyxl?

Question

1 answers

solution1 0 2020-10-30 11:03:21

solution1
0 2020-10-30 11:03:21