简体   繁体   中英

How can I make this python(using openpyxl) program run faster?

Here is my code:

import openpyxl
import os

os.chdir('c:\\users\\Desktop')
wb= openpyxl.load_workbook(filename= 'excel.xlsx',data_only = True)
wb.create_sheet(index=0,title='Summary')
sumsheet= wb.get_sheet_by_name('Summary')
print('Creating Summary Sheet')
#loop through worksheets
print('Looping Worksheets')
for sheet in wb.worksheets:
    for row in  sheet.iter_rows():
        for cell in row:
                 #find headers of columns needed
                if cell.value=='LowLimit':
                     lowCol=cell.column
                if cell.value=='HighLimit':
                     highCol=cell.column
                if cell.value=='MeasValue':
                     measCol=cell.column

                 #name new columns    
                sheet['O1']='meas-low'
                sheet['P1']='high-meas'
                sheet['Q1']='Minimum'
                sheet['R1']='Margin'

                 #find how many rows of each sheet
                maxrow=sheet.max_row
                i=0

                #subtraction using max row
                for i in range(2,maxrow+1):
                      if  sheet[str(highCol)+str(i)].value=='---':
                          sheet['O'+str(i)]='='+str(measCol)+str(i)+'-'+str(lowCol)+str(i)
                          sheet['P'+str(i)]='=9999'
                          sheet['Q'+str(i)]='=MIN(O'+str(i)+':P'+str(i)+')'
                          sheet['R'+str(i)]='=IF(AND(Q'+str(i)+'<3,Q'+str(i)+'>-3),"Marginal","")'
                      elif sheet[str(lowCol)+str(i)].value=='---':
                          sheet['O'+str(i)]='=9999'
                          sheet['P'+str(i)]='='+str(highCol)+str(i)+'-'+str(measCol)+str(i)
                          sheet['Q'+str(i)]='=MIN(O'+str(i)+':P'+str(i)+')'
                          sheet['R'+str(i)]='=IF(AND(Q'+str(i)+'<3,Q'+str(i)+'>-3),"Marginal","")'
                      else:
                          sheet['O'+str(i)]='='+str(measCol)+str(i)+'-'+str(lowCol)+str(i)
                          sheet['P'+str(i)]='='+str(highCol)+str(i)+'-'+str(measCol)+str(i)
                          sheet['Q'+str(i)]='=MIN(O'+str(i)+':P'+str(i)+')'
                          sheet['R'+str(i)]='=IF(AND(Q'+str(i)+'<3,Q'+str(i)+'>-3),"Marginal","")'

                ++i


print('Saving new wb')
import os 
os.chdir('C:\\Users\\hpj683\\Desktop')
wb.save('example.xlsx')

This runs perfectly fine except that it takes 4 minutes to complete one excel workbook. Is there any way I can optimize my code to make this run faster? My research online suggested to change to read_only or write_only to make it run faster however my code requires reading and writing to an excel workbook, so neither of those worked.

The code could benefit from being broken down into separate functions. This will help you identify the slow bits and replace them bit by bit.

The following bits should not be in the loop for every row:

  • finding the headers
  • calling ws.max_row this is very expensive
  • ws["C" + str(i)] . Use ws.cell(row=i, column=3)

And if the nested loop is not a formatting error then why is it nested?

Also you should look at the profile module to find out what is slow. You might want to watch my talk on profiling openpyxl from last year's PyCon UK.

Good luck!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM