使用列名和行索引从 pandas dataframe 中选择值的正确方法是什么？

Question

what is the most efficient way of selecting value from pandas dataframe using column name and row index (by that I mean row number)?使用列名和行索引（我的意思是行号）从 pandas dataframe 中选择值的最有效方法是什么？

I have a case where I have to iterate through rows:我有一个必须遍历行的情况：

I have a working solution:我有一个可行的解决方案：

i = 0
while i < len(dataset) -1:
    if dataset.target[i] == 1:
        dataset.sum_lost[i] = dataset['to_be_repaid_principal'][i] + dataset['to_be_repaid_interest'][i]
        dataset.ratio_lost[i] = dataset.sum_lost[i] / dataset['expected_returned_sum'][i]
    else:
        dataset.sum_lost[i] = 0
        dataset.ratio_lost[i]= 0
    i += 1

But this solution is so much RAM hungry.但是这个解决方案非常需要内存。 I am also getting the following warning:我还收到以下警告：

"A value is trying to be set on a copy of a slice from a DataFrame." “试图在 DataFrame 的切片副本上设置一个值。”

So I am trying to come up with another one:所以我试图想出另一个：

i = 0
while i < len(dataset) -1:
    if dataset.iloc[i, :].loc['target'] == 1:
        dataset.iloc[i, :].loc['sum_lost'] = dataset.iloc[i, :].loc['to_be_repaid_principal'] + dataset.iloc[i, :].loc['to_be_repaid_interest']
        dataset.iloc[i, :].loc['ratio_lost'] = dataset.iloc[i, :].loc['sum_lost'] / dataset.iloc[i, :].loc['expected_returned_sum']
    else:
        dataset.iloc[i, :].loc['sum_lost'] = 0
        dataset.iloc[i, :].loc['ratio_lost'] = 0
    i += 1

But it does not work.但它不起作用。 I would like to come up with a faster/less ram hungry solution, because this will actually be web app a few users could use simultaneously.我想提出一个更快/更少内存消耗的解决方案，因为这实际上是 web 应用程序，一些用户可以同时使用。

Thanks a lot.非常感谢。

Answer 1

If you are thinking about "looping through rows", you are not using pandas right.如果您正在考虑“循环遍历行”，那么您没有正确使用 pandas。 You should think of terms of columns instead.您应该考虑列的术语。

Use np.where which is vectorized (read: fast):使用矢量化的np.where （阅读：快速）：

cond = dataset['target'] == 1
dataset['sumlost'] = np.where(cond, dataset['to_be_repaid_principal'] + dataset['to_be_repaid_interest'], 0)
dataset['ratio_lost'] = np.where(cond, dataset['sumlost'] / dataset['expected_returned_sum'], 0)

使用列名和行索引从 pandas dataframe 中选择值的正确方法是什么？

问题描述

1 个解决方案

解决方案1
2 已采纳 2019-11-09 01:11:45

使用列名和行索引从 pandas dataframe 中选择值的正确方法是什么？

问题描述

1 个解决方案

解决方案1 2 已采纳 2019-11-09 01:11:45

解决方案1
2 已采纳 2019-11-09 01:11:45