How to convert multiple csv files into array faster?

Question

I have a folder containing 9000 CSV files. Each file has 5000 rows and 12 columns. For Deep learning training purposes, I need to convert this whole file( all files ) into an array of shape (9000,5000,12). I am using this code for my work:


path=mypath+'//'+li[0]+'.csv' #li is a list containing the filenames, this is for a filtering purpose, please ignore this  
df=pd.read_csv(path)
a=np.array(df)

path=mypath+'//'+li[1]+'.csv'   
df=pd.read_csv(path)
arr=np.array(df)
a=np.stack((a,arr))
for filename in li[2:]:
    path=mypath+'//'+filename+'.csv'   
    df=pd.read_csv(path) 
    arr=np.array(df)
    if(arr.shape[0]!=4999):
        
        continue
    
    a=np.append(a,[arr],axis=0)

So, basically, I am converting each CSV file into a data frame, and then converting the data frame into an array. Finally stacking the arrays together.

This process is taking too much time. Only 2000 files converted in 1h. Is there any faster approach that can serve my purpose?

Sorry for my bad coding format, I was just doing the roughs and it took too much time

Answer 1

Finally Solved the issue. Thanks to @0 0 I have preallocated the array and it took only 3 minutes to complete my work!

os.chdir("E://2-1//reserach//tanvir sir//datasets//ECGDataDenoised//filter//")
from glob import glob
import numpy as np
strain = glob("*.csv")



arr=np.zeros(( 9061,4999,12  ))
i=0
for filename in strain:
      
    df=pd.read_csv(filename) 
    
    arr[i]=df.values
    i=i+1

The final shape of the array is (9061,4999,12), as I wanted!

How to convert multiple csv files into array faster?

Question

1 answers

solution1
0 2021-03-02 13:52:55

How to convert multiple csv files into array faster?

Question

1 answers

solution1 0 2021-03-02 13:52:55

solution1
0 2021-03-02 13:52:55