简体   繁体   中英

Loading a .csv file in python consuming too much memory

I'm loading a 4gb .csv file in python. Since it's 4gb I expected it would be ok to load it at once, but after a while my 32gb of ram gets completely filled.

Am I doing something wrong? Why does 4gb is becoming so much larger in ram aspects?

Is there a faster way of loading this data?

fname = "E:\Data\Data.csv" 
a = []  
with open(fname) as csvfile:
    reader = csv.reader(csvfile,delimiter=',')
    cont = 0;
    for row in reader:
        cont = cont+1
        print(cont)
        a.append(row)
b = np.asarray(a)

You copied the entire content of the csv at least twice.

Once in a and again in b .

Any additional work over that data consumes additional memory to hold values and such

You could del a once you have b , but note that pandas library provides you a read_csv function and a way to count the rows of the generated Dataframe

You should be able to do an

a = list(reader)

which might be a little better.

Because it's Python :-D One of easiest approaches: create your own class for rows, which would store data at least in slots , which can save couple hundreds bytes per row (reader put them into dict, which is kinda huge even when empty)..... if go further, than you may try to store binary representation of data.

But maybe you can process data without saving entire data-array? It's will consume significantly less memory.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM