Loading data for Machine learning

Question

I have a dataset with >100,000 data points. I am creating ML model and plots for subset of data every time when it meets certain condition.

Will it be better if i load the data before for loop. Or, load the data every time inside for loop.

In first case it will take less time to run "for loop" because i am not loading the data every time, but memory is allocated for all data entire time.

data = pd.read_csv("sample.csv")
data.drop(['column2', 'column3']

for i in range(0,10):
    data['column1'] == i
    # performing the machine learning model and plots

In second case i will be loading the dataset every time but only subset of data will be remaining in the memory after i drop columns and subset the data.

for i in range(0,10):
    data = pd.read_csv("sample.csv")
    data.drop(['column2', 'column3']
    data['column1'] == i

Which is a better approach?

I have tried both, but want to know which is correct.

Answer 1

I think in 1st approach: you will insert the data once and it will loops according to the condition.

But in 2nd approach: for each loop it has to loads and drop certain columns of your data which will take a lot of time.

My suggestion is to go with the 1st approach because the run time less and it is the correct way to approach.

Hope it helps your question.

Loading data for Machine learning

Question

1 answers

solution1
0 2023-01-02 14:08:42

Loading data for Machine learning

Question

1 answers

solution1 0 2023-01-02 14:08:42

solution1
0 2023-01-02 14:08:42