简体   繁体   中英

Is there a faster way to write data to an excel sheet using openpyxl in python?

I am currently writing a program to update and copy data from one spreadsheet to another. The code I have written works fine, but it takes way too long for it to be practical. In total, it takes about an hour to perform this task. The spreadsheets are also very large I might add, one is 20,000 rows by 30 columns, and the other is 3,000 rows by 30 columns. The code updates specific rows in the larger spreadsheet and then copies the data from the smaller spreadsheet onto the larger spreadsheet if that data doesn't already exist there. After analyzing what the problem could be, I found that copying and writing the data to the larger spreadsheet took a majority of the time (~ 55min). The write_only option in openpyxl does not support writing to existing files as I need, so I am stuck as to how to speed up this writing process. I am also new to python so any help would be appreciated, thank you!

This is the code:

# iterate through ticket column of first sheet
for roww in range (2, sheet.max_row+1):
    sheet1_ticket_number = sheet.cell(row=roww, column = 3).value
    # iterate through ticket column of second sheet
    # Ticket number x from sheet 1 compared to all ticket numbers in sheet 2
    for row2 in range(starting_row, (sheet2.max_row+1+sheet.max_row)):
        sheet2_ticket_number = sheet.cell(row = row2, column = 3).value

        # If ticket number matches, check to see if columns match, if not, update
        if (sheet.cell(row=roww, column = 3).value == sheet2.cell(row = row2, column = 3).value):
            check = 'true'
            for i in range(1, sheet.max_column+1):
                if sheet2.cell(row=row2, column = 3+i).value != sheet.cell(row=roww, column =3+i).value and (3+i != 15) and (3+i != 38) and (3+i != 14):
                    sheet2.cell(row=row2, column = 3+i).value = sheet.cell(row=roww, column =3+i).value
                    #print('updated row# ', row2, 'Column#', 3+i, 'ticket#', sheet2.cell(row=row2, column = 3).value,  'to:', sheet2.cell(row=row2, column = 3+i).value)

                if sheet2.cell(row=row2, column = 1).value is None:
                    sheet2.cell(row=row2, column = 1).value = sheet.cell(row=roww, column =1).value
                if sheet2.cell(row=row2, column = 2).value is None:
                    sheet2.cell(row=row2, column = 1).value = sheet.cell(row=roww, column =1).value
            break


        # if ticket number is not in second file/ empty row, add new ticket row w column entries. 
        if (sheet2.cell(row = row2, column = 3).value is None) and (sheet2.cell(row = row2+1, column = 3).value is None):
            sheet2.cell(row=row2, column =3).value = sheet1_ticket_number
            #print('printed new ticket row# ', sheet2.cell(row=row2, column =3).value)
            for j in range(1, sheet.max_column+1):
                if sheet2.cell(row=row2, column = 3+j).value != sheet.cell(row=roww, column =3+j).value:
                    sheet2.cell(row=row2, column = 2).value = sheet.cell(row=roww, column =2).value
                    sheet2.cell(row=row2, column = 1).value = sheet.cell(row=roww, column =1).value
                    sheet2.cell(row=row2, column = 3+j).value = sheet.cell(row=roww, column =3+j).value
            break       

Fast Excel work needs arrays, to begin with

First of all, do not read any cells, but read ranges into arrays so that both the first and the other sheet's data is stored in arrays. If you need mappings, take dictionaries.

Excel is fast, you do not need Python openpyxl (not your question, though, but the best way to do it)

Then loop/pick the items only within the arrays. Excel is very slow with cells. When everything is done, mark the range of exactly the size of the array you want to paste, and you will be able to paste everything in one go, not cell by cell.
In short (legend: first sheet = 1, other sheet = 2):

  • from range1 to array1 and range2 to array2
  • refresh array2 with array1 data
  • from array2 to range2.

Python openpyxl (not needed, Excel is better, but your question asks for it)

You already name the write-only mode of openpyxl . Following Why does writing to a workbook of a few MB with Python's openpyxl module eat Gigabytes of RAM? , you can save a lot of RAM (and time) with it, but as you say, it writes only to a new workbook. Staying with your question (though I think you should not ask like this, but just speed up your code with arrays and paste that at the end into the other sheet), why don't you just take it as it is and dump the output of the array for the other Excel sheet (see the Excel heading above) into such a new workbook and afterwards open it by hand and paste the data again as values into the other Excel sheet (mind: switch off any cell updates at the moment of pasting). That is one step by hand but would save you an hour of copying and pasting of cells. This is only an answer to somehow have openpyxl at work for this. You do not need it, just use Excel.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM