What is the correct way of selecting value from pandas dataframe using column name and row index?

Question

what is the most efficient way of selecting value from pandas dataframe using column name and row index (by that I mean row number)?

I have a case where I have to iterate through rows:

I have a working solution:

i = 0
while i < len(dataset) -1:
    if dataset.target[i] == 1:
        dataset.sum_lost[i] = dataset['to_be_repaid_principal'][i] + dataset['to_be_repaid_interest'][i]
        dataset.ratio_lost[i] = dataset.sum_lost[i] / dataset['expected_returned_sum'][i]
    else:
        dataset.sum_lost[i] = 0
        dataset.ratio_lost[i]= 0
    i += 1

But this solution is so much RAM hungry. I am also getting the following warning:

"A value is trying to be set on a copy of a slice from a DataFrame."

So I am trying to come up with another one:

i = 0
while i < len(dataset) -1:
    if dataset.iloc[i, :].loc['target'] == 1:
        dataset.iloc[i, :].loc['sum_lost'] = dataset.iloc[i, :].loc['to_be_repaid_principal'] + dataset.iloc[i, :].loc['to_be_repaid_interest']
        dataset.iloc[i, :].loc['ratio_lost'] = dataset.iloc[i, :].loc['sum_lost'] / dataset.iloc[i, :].loc['expected_returned_sum']
    else:
        dataset.iloc[i, :].loc['sum_lost'] = 0
        dataset.iloc[i, :].loc['ratio_lost'] = 0
    i += 1

But it does not work. I would like to come up with a faster/less ram hungry solution, because this will actually be web app a few users could use simultaneously.

Thanks a lot.

Answer 1

If you are thinking about "looping through rows", you are not using pandas right. You should think of terms of columns instead.

Use np.where which is vectorized (read: fast):

cond = dataset['target'] == 1
dataset['sumlost'] = np.where(cond, dataset['to_be_repaid_principal'] + dataset['to_be_repaid_interest'], 0)
dataset['ratio_lost'] = np.where(cond, dataset['sumlost'] / dataset['expected_returned_sum'], 0)

What is the correct way of selecting value from pandas dataframe using column name and row index?

Question

1 answers

solution1
2 ACCPTED 2019-11-09 01:11:45

What is the correct way of selecting value from pandas dataframe using column name and row index?

Question

1 answers

solution1 2 ACCPTED 2019-11-09 01:11:45

solution1
2 ACCPTED 2019-11-09 01:11:45