简体   繁体   English

使用列名和行索引从 pandas dataframe 中选择值的正确方法是什么?

[英]What is the correct way of selecting value from pandas dataframe using column name and row index?

what is the most efficient way of selecting value from pandas dataframe using column name and row index (by that I mean row number)?使用列名和行索引(我的意思是行号)从 pandas dataframe 中选择值的最有效方法是什么?

I have a case where I have to iterate through rows:我有一个必须遍历行的情况:

I have a working solution:我有一个可行的解决方案:

i = 0
while i < len(dataset) -1:
    if dataset.target[i] == 1:
        dataset.sum_lost[i] = dataset['to_be_repaid_principal'][i] + dataset['to_be_repaid_interest'][i]
        dataset.ratio_lost[i] = dataset.sum_lost[i] / dataset['expected_returned_sum'][i]
    else:
        dataset.sum_lost[i] = 0
        dataset.ratio_lost[i]= 0
    i += 1   

But this solution is so much RAM hungry.但是这个解决方案非常需要内存。 I am also getting the following warning:我还收到以下警告:

"A value is trying to be set on a copy of a slice from a DataFrame." “试图在 DataFrame 的切片副本上设置一个值。”

So I am trying to come up with another one:所以我试图想出另一个:

i = 0
while i < len(dataset) -1:
    if dataset.iloc[i, :].loc['target'] == 1:
        dataset.iloc[i, :].loc['sum_lost'] = dataset.iloc[i, :].loc['to_be_repaid_principal'] + dataset.iloc[i, :].loc['to_be_repaid_interest']
        dataset.iloc[i, :].loc['ratio_lost'] = dataset.iloc[i, :].loc['sum_lost'] / dataset.iloc[i, :].loc['expected_returned_sum']
    else:
        dataset.iloc[i, :].loc['sum_lost'] = 0
        dataset.iloc[i, :].loc['ratio_lost'] = 0
    i += 1

But it does not work.但它不起作用。 I would like to come up with a faster/less ram hungry solution, because this will actually be web app a few users could use simultaneously.我想提出一个更快/更少内存消耗的解决方案,因为这实际上是 web 应用程序,一些用户可以同时使用。

Thanks a lot.非常感谢。

If you are thinking about "looping through rows", you are not using pandas right.如果您正在考虑“循环遍历行”,那么您没有正确使用 pandas。 You should think of terms of columns instead.您应该考虑列的术语。

Use np.where which is vectorized (read: fast):使用矢量化的np.where (阅读:快速):

cond = dataset['target'] == 1
dataset['sumlost'] = np.where(cond, dataset['to_be_repaid_principal'] + dataset['to_be_repaid_interest'], 0)
dataset['ratio_lost'] = np.where(cond, dataset['sumlost'] / dataset['expected_returned_sum'], 0)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 从Pandas DataFrame获取最大值的行索引和列索引 - Get the row index and column index of maximum value from Pandas DataFrame 从不同行的列中为 pandas DataFrame 列分配值的最佳方法是什么? - What is the optimal way to assign a value to a pandas DataFrame column from a column in a different row? Pandas 数据框按索引选择行,按名称选择列 - Pandas dataframe select row by index and column by name 如何打印pandas dataframe每一行的索引值、列名和列数据? - How to print index value, column name, and column data for each row of pandas dataframe? 为什么我的 pandas dataframe 索引的第一行是索引列的名称? - Why is the first row of my pandas dataframe index the name of the index column? 使用索引元组值作为数据帧的行和列名称将Pandas groupby.groups结果转换为数据帧 - Converting Pandas groupby.groups result into dataframe, using index tuple value as row and columns name of dataframe 根据行和列从pandas数据框中选择数据值以追加到列表 - Selecting a data value from pandas dataframe based on row and column to append to list 如何使用 Pandas 中的索引列从“其他”行返回值 - How to return value from "other" row using index column in Pandas 在多索引Pandas Dataframe中设置值的正确方法 - Correct way to set value in multi-index Pandas Dataframe 索引 Pandas Dataframe 混合行号和列名 - Index Pandas Dataframe mixing row number and column name
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM