I'm working with Python on Excel files. Until now I was using OpenPyXl. I need to iterate over the rows and delete some of them if they do not meet specific criteria let's say I was using something like:
current_row = 1
while current_row <= ws.max_row
if 'something' in ws[f'L{row}'].value:
data_ws.delete_rows(current_row)
continue
current_row += 1
Everything was alright until I have encountered problem with ws.max_rows
. In a new Excel file which I've received to process ws.max_rows
was returning more rows than it was in the reality. After some googling I've found out why is it happening. Here's a great explanation of the problem which I've found in the comment section on the Stack:
However, ws.max_row will not check if last rows are empty or not. If cell's content at the end of the worksheet is deleted using Del key or by removing duplicates, remaining empty rows at the end of your data will still count as a used row. If you do not want to keep these empty rows, you will have to delete those entire rows by selecting rows number on the left of your spreadsheet and deleting them (right click on selected row number(s) -> Delete) – V. Brunelle Thanks V. Brunelle for very good explanation of the cause of the problem.
In my case it is because some of the rows are deleted by removing duplicates. For eg there's 400 rows in my file listed one by one (without any gaps) but ws.max_row
is returning 500
For now I'm using a quick fix:
while current_row <= len([row for row in data_ws.iter_rows(min_row=min_row) if not all([cell.value is None for cell in row])])
But I know that it is very inefficient. That's the reason why I'm asking this question. I'm looking for possible solution. From what I've found here on the Stack I can:
worksheet
and iterate over that copy and ws.delete_rows
in the original worksheet
so I will need to my fix only oncefor_loop
so I won't have to deal with ws.max_rows
since for_loops
works fine in that case (they read proper file dimensions). This method seems promising for me, but always I've got 4 rows at the top of the workbook which I'm not touching at all and potential debugging would need to be done backwards as well, which might not be very enjoyable:D.I'm stuck now, because I really don't know which route should I choose.
I would appreciate every advice/opinion in the topic and if possible I would like to make a small discussion here.
Best regards!
If max rows doesn't report what you expect you'll need to sort the issue best you can and perhaps that might be by manually deleting; " delete those entire rows by selecting rows number on the left of your spreadsheet and deleting them (right click on selected row number(s) -> Delete) " or making some other determination in your code as what the last row is, then perhaps programatically deleting all the rows from there to max_row so at least it reports correctly on the next code run.
You could also incorporate your fix code into your example code for deleting rows that meet specific criteria.
For example; a test sheet has 9 rows of data but cell B15 is an empty string so max_rows returns 15 rather than 9.
The example code checks each used cell in the row for None type in the cell value and only processes the 9 rows with data.
from openpyxl import load_workbook
filename = "foo.xlsx"
wb = load_workbook(filename)
data_ws = wb['Sheet1']
print(f"Max Rows Reports {data_ws.max_row}")
for row in data_ws:
print(f"Checking row {row[0].row}")
if all(cell.value is not None for cell in row):
if 'something' in data_ws[f'L{row[0].row}'].value:
data_ws.delete_rows(row[0].row)
else:
print(f"Actual Max Rows is {row[0].row}")
break
wb.save('out_' + filename)
Output
Max Rows Reports 15
Checking row 1
Checking row 2
Checking row 3
Checking row 4
Checking row 5
Checking row 6
Checking row 7
Checking row 8
Checking row 9
Actual Max Rows is 9
Of course this is not perfect, if any of the 9 rows with data had one cell value of None the loop would stop at that point. However if you know that's not going to be the case it may be all you need.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.