简体   繁体   中英

Loading data for Machine learning

I have a dataset with >100,000 data points. I am creating ML model and plots for subset of data every time when it meets certain condition.

Will it be better if i load the data before for loop. Or, load the data every time inside for loop.

In first case it will take less time to run "for loop" because i am not loading the data every time, but memory is allocated for all data entire time.

data = pd.read_csv("sample.csv")
data.drop(['column2', 'column3']

for i in range(0,10):
    data['column1'] == i
    # performing the machine learning model and plots

In second case i will be loading the dataset every time but only subset of data will be remaining in the memory after i drop columns and subset the data.

for i in range(0,10):
    data = pd.read_csv("sample.csv")
    data.drop(['column2', 'column3']
    data['column1'] == i

Which is a better approach?

I have tried both, but want to know which is correct.

I think in 1st approach: you will insert the data once and it will loops according to the condition.

But in 2nd approach: for each loop it has to loads and drop certain columns of your data which will take a lot of time.

My suggestion is to go with the 1st approach because the run time less and it is the correct way to approach.

Hope it helps your question.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM