查找所有列的第一个值小于阈值的索引

Question

I have a dataframe where, for each column, I need to find the index of the first value less than 5. I found a solution that works here for an individual list/series, but can't apply it to an entire dataframe in a 'pythony' way without breaking each column into its own variable via a for loop.我有一个数据框，对于每一列，我需要找到小于 5 的第一个值的索引。我找到了一个适用于单个列表/系列的解决方案，但不能将其应用于整个数据框'pythony' 方式，无需通过 for 循环将每一列分解为自己的变量。

Data = pd.DataFrame({0: [2,2,3,2,2,3,5],
    1: [8,7,7,8,7,7,7],
    2: [9,7,7,4,4,4,9]})

The desired output would be [0,999,3] where 999 is a flag for not finding a value < 5. This code works for an individual series/list:所需的输出将是 [0,999,3] 其中 999 是未找到值 < 5 的标志。此代码适用于单个系列/列表：

next((x for x, val in enumerate(DataS) if val < 5),999)

but when I try to apply this over all the columns I can't get it to work:但是当我尝试将它应用于所有列时，我无法让它工作：

Data.apply(lambda x: next((x for x, val in enumerate(Data) if val < 5),999))

This code returns a value of 0 for every column.此代码为每一列返回一个值 0。 Can someone help me understand why apply/lambda aren't behaving how I think they should?有人可以帮我理解为什么 apply/lambda 的行为不像我认为的那样吗？

As a bonus, this function also appears to skip over nan values.作为奖励，这个函数似乎也跳过了 nan 值。 Is there a different way to write this to flag nans?有没有不同的方法来写这个来标记nans？

Answer 1

Let us use idxmax to find the index of values < 5 , then mask the index with 999 if there is no such value < 5让我们使用idxmax查找值 < 5的索引，如果没有这样的值 < 5 ，则使用999屏蔽索引

m = df.lt(5)
m.idxmax().where(m.any(), 999)

0      0
1    999
2      3
dtype: int64

查找所有列的第一个值小于阈值的索引

问题描述

1 个解决方案

解决方案1
2 已采纳 2022-07-11 16:53:34

查找所有列的第一个值小于阈值的索引

问题描述

1 个解决方案

解决方案1 2 已采纳 2022-07-11 16:53:34

解决方案1
2 已采纳 2022-07-11 16:53:34