In my app which takes data from heavy csv files and uploads it to dataabse, I needed to import data from excel files too. For this, I first used xlrd
to convert excel files into csv, which worked great for small files but took a lot of time when converting large files. When I gave a file of 6 sheets with 1m rows each, I waited for 40 minutes before terminating the process because that was too long of a time to wait.
Currently I'm using openmyxl
library to convert excel files to csv and this library is significantly faster than xlrd
, especially in read-only
mode, but, sadly, even the delay of 8-10 minutes for conversion of heavy files is too much.
Is there any time efficient solution in Python where I can convert large Excel files with multiple sheets without having to wait for minutes?
This is the code I'm currently using:
def convertExcelToCSV(excelFilePath, uploadFilePath):
lstCSVFilePaths = []
workbook = load_workbook(excelFilePath, read_only=True)
for worksheet_name in workbook.get_sheet_names():
worksheet = workbook.get_sheet_by_name(name = worksheet_name)
#skip sheet if empty
if worksheet.rows == 0: continue
objCSV = {}
objCSV["fileName"] = worksheet_name + '.csv'
objCSV["isGZip"] = False
csvFilePath = uploadFilePath + "CSV Files/"
#make directory for CSV files that will be made from Excel file
if not os.path.exists(csvFilePath):
os.makedirs(csvFilePath)
fd, csvFilePath = tempfile.mkstemp(suffix=worksheet_name + ".csv", dir=csvFilePath)
objCSV["filePath"] = csvFilePath
with open(csvFilePath, 'w', newline="") as your_csv_file:
wr = csv.writer(your_csv_file, quoting=csv.QUOTE_ALL)
for rownum in worksheet.rows:
wr.writerow([cell.value for cell in rownum])
your_csv_file.close()
lstCSVFilePaths.append(objCSV)
return lstCSVFilePaths
With read-only mode you should be able to read worksheets and write CSV in parallel. Other than this I don't think there is a lot that can be done: converting XML to Python is probably the bottleneck.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.