简体   繁体   中英

Sampling from a 6GB csv file without loading in Python

I have a training data-set in CSV format of size 6 GB which I am required to analyze and implement machine learning on it. My system RAM is 6 GB so it is not possible for me to load the file in the memory. I need to perform random sampling and load the samples from the data-set. The number of samples may vary according to requirement. How to do this?

Something to start with:

with open('dataset.csv') as f:
    for line in f:
        sample_foo(line.split(","))

This will load only one line at a time in memory and not the whole file.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM