简体   繁体   中英

Convert large Excel files to CSV

In my app which takes data from heavy csv files and uploads it to dataabse, I needed to import data from excel files too. For this, I first used xlrd to convert excel files into csv, which worked great for small files but took a lot of time when converting large files. When I gave a file of 6 sheets with 1m rows each, I waited for 40 minutes before terminating the process because that was too long of a time to wait.

Currently I'm using openmyxl library to convert excel files to csv and this library is significantly faster than xlrd , especially in read-only mode, but, sadly, even the delay of 8-10 minutes for conversion of heavy files is too much.

Is there any time efficient solution in Python where I can convert large Excel files with multiple sheets without having to wait for minutes?

This is the code I'm currently using:

def convertExcelToCSV(excelFilePath, uploadFilePath):

lstCSVFilePaths = []

workbook = load_workbook(excelFilePath, read_only=True)

for worksheet_name in workbook.get_sheet_names():        
    worksheet = workbook.get_sheet_by_name(name = worksheet_name)        

    #skip sheet if empty
    if worksheet.rows == 0: continue        

    objCSV = {}

    objCSV["fileName"] = worksheet_name + '.csv'
    objCSV["isGZip"] = False

    csvFilePath = uploadFilePath + "CSV Files/"

    #make directory for CSV files that will be made from Excel file
    if not os.path.exists(csvFilePath):        
        os.makedirs(csvFilePath)

    fd, csvFilePath = tempfile.mkstemp(suffix=worksheet_name + ".csv", dir=csvFilePath)

    objCSV["filePath"] = csvFilePath

    with open(csvFilePath, 'w', newline="") as your_csv_file:            
        wr = csv.writer(your_csv_file, quoting=csv.QUOTE_ALL)            
        for rownum in worksheet.rows:      
            wr.writerow([cell.value for cell in rownum])

    your_csv_file.close()

    lstCSVFilePaths.append(objCSV)

return lstCSVFilePaths

With read-only mode you should be able to read worksheets and write CSV in parallel. Other than this I don't think there is a lot that can be done: converting XML to Python is probably the bottleneck.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM