How to open the excel file creating from pandas faster?

Question

The excel file creating from python is extremely slow to open even the size of file is about 50 mb.

I have tried on both pandas and openpyxl.

def to_file(list_report,list_sheet,strip_columns,Name):
    i = 0
    wb = ExcelWriter(path_output + '\\' + Name + dateformat + '.xlsx')
    while i <= len(list_report)-1:
        try:
            df = pd.DataFrame(pd.read_csv(path_input + '\\' + list_report[i] + reportdate + '.csv'))
            for column in strip_column:
                try:
                    df[column] = df[column].str.strip('=("")')
                except:
                    pass
            df = adjust_report(df,list_report[i])
            df = df.apply(pd.to_numeric, errors ='ignore', downcast = 'integer')
            df.to_excel(wb, sheet_name = list_sheet[i], index = False)
        except:
            print('Missing report: ' + list_report[i])
        i += 1
    wb.save()

Is there anyway to speed it up?

Answer 1

idiom

Let us rename list_report to reports . Then your while loop is usually expressed as simply: for i in range(len(reports)):

You access the i -th element several times. The loop could bind that for you, with: for i, report in enumerate(reports): .

But it turns out you never even need i . So most folks would write this as: for report in reports:

code organization

This bit of code is very nice:

        for column in strip_column:
            try:
                df[column] = df[column].str.strip('=("")')
            except:
                pass

I recommend you bury it in a helper function, using def strip_punctuation . (The list should be plural, I think? strip_columns ?) Then you would have a simple sequence of df assignments.

timing

Profile elapsed time() . Surround each df assignment with code like this:

    t0 = time()
    df = ...
    print(time() - t0)

That will show you which part of your processing pipeline takes the longest and therefore should receive the most effort for speeding it up.

I suspect adjust_report() uses the bulk of the time, but without seeing it that's hard to say.

How to open the excel file creating from pandas faster?

Question

1 answers

solution1
0 2019-03-26 15:11:07

idiom

code organization

timing

How to open the excel file creating from pandas faster?

Question

1 answers

solution1 0 2019-03-26 15:11:07

idiom

code organization

timing

solution1
0 2019-03-26 15:11:07