I have a training data-set in CSV format of size 6 GB which I am required to analyze and implement machine learning on it. My system RAM is 6 GB so it is not possible for me to load the file in the memory. I need to perform random sampling and load the samples from the data-set. The number of samples may vary according to requirement. How to do this?
Something to start with:
with open('dataset.csv') as f:
for line in f:
sample_foo(line.split(","))
This will load only one line at a time in memory and not the whole file.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.