简体   繁体   中英

How to open the excel file creating from pandas faster?

The excel file creating from python is extremely slow to open even the size of file is about 50 mb.

I have tried on both pandas and openpyxl.

def to_file(list_report,list_sheet,strip_columns,Name):
    i = 0
    wb = ExcelWriter(path_output + '\\' + Name + dateformat + '.xlsx')
    while i <= len(list_report)-1:
        try:
            df = pd.DataFrame(pd.read_csv(path_input + '\\' + list_report[i] + reportdate + '.csv'))
            for column in strip_column:
                try:
                    df[column] = df[column].str.strip('=("")')
                except:
                    pass
            df = adjust_report(df,list_report[i])
            df = df.apply(pd.to_numeric, errors ='ignore', downcast = 'integer')
            df.to_excel(wb, sheet_name = list_sheet[i], index = False)
        except:
            print('Missing report: ' + list_report[i])
        i += 1
    wb.save()

Is there anyway to speed it up?

idiom

Let us rename list_report to reports . Then your while loop is usually expressed as simply: for i in range(len(reports)):

You access the i -th element several times. The loop could bind that for you, with: for i, report in enumerate(reports): .

But it turns out you never even need i . So most folks would write this as: for report in reports:

code organization

This bit of code is very nice:

        for column in strip_column:
            try:
                df[column] = df[column].str.strip('=("")')
            except:
                pass

I recommend you bury it in a helper function, using def strip_punctuation . (The list should be plural, I think? strip_columns ?) Then you would have a simple sequence of df assignments.

timing

Profile elapsed time() . Surround each df assignment with code like this:

    t0 = time()
    df = ...
    print(time() - t0)

That will show you which part of your processing pipeline takes the longest and therefore should receive the most effort for speeding it up.

I suspect adjust_report() uses the bulk of the time, but without seeing it that's hard to say.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM