is there a Faster way to write or read in/to with pandas data frame with about 1 million row

Question

I am trying to be really specific about my issue. I have a dataframe with some 200+ columns and 1mil+ rows. I am reading or writing it to a excel file which takes more than 45 mins if I recorded right.

df = pd.read_csv("data_file.csv", low_memory=False, header=0, delimiter = ',', na_values = ('', 'nan'))
df.to_excel('data_file.xlsx', header=0, index=False)

My question- is there anyway we can read or write faster to a file with pandas dataframe because this is just one file example. I have many more such files with me

Answer 1

Two thoughts:

Investigate Dask , which provides a Pandas like DataFrame that can distribute processing of large datasets across multiple CPUs or clusters. Hard to say to what degree you will get a speed up, if your performance is purely IO bound, but certainly worth investigating. Take a quick look at the Dask use cases to get an understanding of its capabilities.
If you are going to repeatedly read the same CSV input files, then I would suggest converting these to HDF, as reading HDF is orders of magnitude faster than reading the equivalent CSV file. It's as simple as reading the file into a DataFrame and then writing it back out using DataFrame.to_hdf(). Obviously this will only help if you can do this conversion as a once off exercise, and then use the HDF files from that point forward whenever you run your code.

Regards, Ian

Answer 2

That is a big file you are working with. If you need to process the data then you can't really get around the long read and write times.

Answer 3

Do NOT write to xlsx, use csv, writing to xlsx is taking long time. Write to csv. It takes a minute on my cheap laptop with SSD.

is there a Faster way to write or read in/to with pandas data frame with about 1 million row

Question

3 answers

solution1
2 2018-07-12 12:50:52

solution2
0 2018-07-12 12:33:01

solution3
0 2018-07-12 20:09:14

is there a Faster way to write or read in/to with pandas data frame with about 1 million row

Question

3 answers

solution1 2 2018-07-12 12:50:52

solution2 0 2018-07-12 12:33:01

solution3 0 2018-07-12 20:09:14

solution1
2 2018-07-12 12:50:52

solution2
0 2018-07-12 12:33:01

solution3
0 2018-07-12 20:09:14