[英]Find index of first value less than threshold for all columns
I have a dataframe where, for each column, I need to find the index of the first value less than 5. I found a solution that works here for an individual list/series, but can't apply it to an entire dataframe in a 'pythony' way without breaking each column into its own variable via a for loop.我有一个数据框,对于每一列,我需要找到小于 5 的第一个值的索引。我找到了一个适用于单个列表/系列的解决方案,但不能将其应用于整个数据框'pythony' 方式,无需通过 for 循环将每一列分解为自己的变量。
Data = pd.DataFrame({0: [2,2,3,2,2,3,5],
1: [8,7,7,8,7,7,7],
2: [9,7,7,4,4,4,9]})
The desired output would be [0,999,3] where 999 is a flag for not finding a value < 5. This code works for an individual series/list:所需的输出将是 [0,999,3] 其中 999 是未找到值 < 5 的标志。此代码适用于单个系列/列表:
next((x for x, val in enumerate(DataS) if val < 5),999)
but when I try to apply this over all the columns I can't get it to work:但是当我尝试将它应用于所有列时,我无法让它工作:
Data.apply(lambda x: next((x for x, val in enumerate(Data) if val < 5),999))
This code returns a value of 0 for every column.此代码为每一列返回一个值 0。 Can someone help me understand why apply/lambda aren't behaving how I think they should?
有人可以帮我理解为什么 apply/lambda 的行为不像我认为的那样吗?
As a bonus, this function also appears to skip over nan values.作为奖励,这个函数似乎也跳过了 nan 值。 Is there a different way to write this to flag nans?
有没有不同的方法来写这个来标记nans?
Let us use idxmax
to find the index of values < 5
, then mask the index with 999
if there is no such value < 5
让我们使用
idxmax
查找值 < 5
的索引,如果没有这样的值 < 5
,则使用999
屏蔽索引
m = df.lt(5)
m.idxmax().where(m.any(), 999)
0 0
1 999
2 3
dtype: int64
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.