[英]Finding the index of the first element (e.g “True”) from a series/column
How do I find the index of an element (eg "True") in a series or a column?如何在系列或列中找到元素的索引(例如“True”)?
For example I have a column, where I want to identify the first instance where an event occur.例如,我有一列,我想在其中确定事件发生的第一个实例。 So I write it as所以我把它写成
Variable = df["Force"] < event
This then creates a boolen series of Data where it is False, until the first instance it becomes True.然后,这将创建一个布尔数据系列,其中它为 False,直到第一个实例变为 True。 How then do I find the index of data point?那么我如何找到数据点的索引?
Is there are better way?有没有更好的办法?
Use idxmax
to find the first instance of the maximum value.使用idxmax
查找最大值的第一个实例。 In this case, True
is the maximum value.在这种情况下, True
是最大值。
df['Force'].lt(event).idxmax()
Consider the sample df
:考虑示例df
:
df = pd.DataFrame(dict(Force=[5, 4, 3, 2, 1]), list('abcde'))
df
Force
a 5
b 4
c 3
d 2
e 1
The first instance of Force
being less than 3
is at index 'd'
. Force
小于3
的第一个实例位于索引'd'
。
df['Force'].lt(3).idxmax()
'd'
Be aware that if no value for Force
is less than 3, then the maximum will be False
and the first instance will be the first one.请注意,如果Force
任何值都不小于 3,则最大值将为False
,第一个实例将是第一个实例。
Also consider the alternative argmax
还可以考虑替代argmax
df.Force.lt(3).values.argmax()
3
It returns the position of the first instance of maximal value.它返回最大值的第一个实例的位置。 You can then use this to find the corresponding index
value:然后您可以使用它来查找相应的index
值:
df.index[df.Force.lt(3).values.argmax()]
'd'
Also, in the future, argmax
will be a Series method.此外,在未来, argmax
将是一个系列方法。
You can also try first_valid_index
with where
.您也可以使用where
尝试first_valid_index
。
df = pd.DataFrame([[5], [4], [3], [2], [1]], columns=["Force"])
df.Force.where(df.Force < 3).first_valid_index()
3
where
will replace the part that does not meet the condition with np.nan
by default . where
将默认用np.nan
替换不满足条件的部分。 Then, we find the first valid index out of the series.然后,我们找到系列中的第一个有效索引。
Or this: select a subset of the item that you are interested in, here Variable == 1
.或者:选择您感兴趣的项目的一个子集,这里是Variable == 1
。 Then find the first item in its index.然后找到其索引中的第一项。
df = pd.DataFrame([[5], [4], [3], [2], [1]], columns=["Force"])
v = (df["Force"] < 3)
v[v == 1].index[0]
Bonus: if you need the index of first appearance of many kinds of items, you can use drop_duplicates
.奖励:如果您需要多种项目的首次出现的索引,您可以使用drop_duplicates
。
df = pd.DataFrame([["yello"], ["yello"], ["blue"], ["red"], ["blue"], ["red"]], columns=["Force"])
df.Force.drop_duplicates().reset_index()
index Force
0 0 yello
1 2 blue
2 3 red
Some more work...还有一些工作...
df.Force.drop_duplicates().reset_index().set_index("Force").to_dict()["index"]
{'blue': 2, 'red': 3, 'yello': 0}
Below is a non-pandas solution which I find easy to adapt:下面是一个我觉得很容易适应的非熊猫解决方案:
import pandas as pd
df = pd.DataFrame(dict(Force=[5, 4, 3, 2, 1]), list('abcde'))
next(idx for idx, x in zip(df.index, df.Force) if x < 3) # d
It works by iterating to the first result of a generator expression.它通过迭代生成器表达式的第一个结果来工作。
Pandas appears to perform poorly in comparison:相比之下,熊猫似乎表现不佳:
df = pd.DataFrame(dict(Force=np.random.randint(0, 100000, 100000)))
n = 99900
%timeit df['Force'].lt(n).idxmin()
# 1000 loops, best of 3: 1.57 ms per loop
%timeit df.Force.where(df.Force > n).first_valid_index()
# 100 loops, best of 3: 1.61 ms per loop
%timeit next(idx for idx, x in zip(df.index, df.Force) if x > n)
# 10000 loops, best of 3: 100 µs per loop
Here is an all-pandas solution that I consider a little neater than some of the other answers.这是一个全熊猫解决方案,我认为它比其他一些答案更简洁。 It is also able to handle the corner case where no value of the input series satisfies the condition.它还能够处理输入序列的值不满足条件的极端情况。
def first_index_ordered(mask):
assert mask.index.is_monotonic_increasing
assert mask.dtype == bool
idx_min = mask[mask].index.min()
return None if pd.isna(idx_min) else idx_min
col = "foo"
thr = 42
mask = df[col] < thr
idx_first = first_index_ordered(mask)
The above assumed that mask
has a value-ordered, monotonically increasing index.上面假设mask
有一个值有序的、单调递增的索引。 If this is not the case, we have to do a bit more:如果不是这种情况,我们必须做更多的事情:
def first_index_unordered(mask):
assert mask.dtype == bool
index = mask.index
# This creates a RangeIndex, which is monotonic
mask = mask.reset_index(drop=True)
idx_min = mask[mask].index.min()
return None if pd.isna(idx_min) else index[idx_min]
Of course, we can combine both cases in one function:当然,我们可以将这两种情况组合在一个函数中:
def first_index_where(mask):
if mask.index.is_monotonic_increasing:
return first_index_ordered(mask)
else:
return first_index_unordered(mask)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.