简体   繁体   English

从系列/列中查找第一个元素的索引(例如“True”)

[英]Finding the index of the first element (e.g “True”) from a series/column

How do I find the index of an element (eg "True") in a series or a column?如何在系列或列中找到元素的索引(例如“True”)?

For example I have a column, where I want to identify the first instance where an event occur.例如,我有一列,我想在其中确定事件发生的第一个实例。 So I write it as所以我把它写成

Variable = df["Force"] < event

This then creates a boolen series of Data where it is False, until the first instance it becomes True.然后,这将创建一个布尔数据系列,其中它为 False,直到第一个实例变为 True。 How then do I find the index of data point?那么我如何找到数据点的索引?

Is there are better way?有没有更好的办法?

Use idxmax to find the first instance of the maximum value.使用idxmax查找最大值的第一个实例。 In this case, True is the maximum value.在这种情况下, True是最大值。

df['Force'].lt(event).idxmax()

Consider the sample df :考虑示例df

df = pd.DataFrame(dict(Force=[5, 4, 3, 2, 1]), list('abcde'))
df

   Force
a      5
b      4
c      3
d      2
e      1

The first instance of Force being less than 3 is at index 'd' . Force小于3的第一个实例位于索引'd'

df['Force'].lt(3).idxmax()
'd'

Be aware that if no value for Force is less than 3, then the maximum will be False and the first instance will be the first one.请注意,如果Force任何值都不小于 3,则最大值将为False ,第一个实例将是第一个实例。

Also consider the alternative argmax还可以考虑替代argmax

df.Force.lt(3).values.argmax()
3

It returns the position of the first instance of maximal value.它返回最大值的第一个实例的位置。 You can then use this to find the corresponding index value:然后您可以使用它来查找相应的index值:

df.index[df.Force.lt(3).values.argmax()]
'd'

Also, in the future, argmax will be a Series method.此外,在未来, argmax将是一个系列方法。

You can also try first_valid_index with where .您也可以使用where尝试first_valid_index

df = pd.DataFrame([[5], [4], [3], [2], [1]], columns=["Force"])
df.Force.where(df.Force < 3).first_valid_index()
3

where will replace the part that does not meet the condition with np.nan by default . where默认np.nan替换不满足条件的部分。 Then, we find the first valid index out of the series.然后,我们找到系列中的第一个有效索引。


Or this: select a subset of the item that you are interested in, here Variable == 1 .或者:选择您感兴趣的项目的一个子集,这里是Variable == 1 Then find the first item in its index.然后找到其索引中的第一项。

df = pd.DataFrame([[5], [4], [3], [2], [1]], columns=["Force"])
v = (df["Force"] < 3)
v[v == 1].index[0]

Bonus: if you need the index of first appearance of many kinds of items, you can use drop_duplicates .奖励:如果您需要多种项目的首次出现的索引,您可以使用drop_duplicates

df = pd.DataFrame([["yello"], ["yello"], ["blue"], ["red"],  ["blue"], ["red"]], columns=["Force"])  
df.Force.drop_duplicates().reset_index()
    index   Force
0   0       yello
1   2       blue
2   3       red

Some more work...还有一些工作...

df.Force.drop_duplicates().reset_index().set_index("Force").to_dict()["index"]
{'blue': 2, 'red': 3, 'yello': 0}

Below is a non-pandas solution which I find easy to adapt:下面是一个我觉得很容易适应的非熊猫解决方案:

import pandas as pd

df = pd.DataFrame(dict(Force=[5, 4, 3, 2, 1]), list('abcde'))

next(idx for idx, x in zip(df.index, df.Force) if x < 3)  # d

It works by iterating to the first result of a generator expression.它通过迭代生成器表达式的第一个结果来工作。

Pandas appears to perform poorly in comparison:相比之下,熊猫似乎表现不佳:

df = pd.DataFrame(dict(Force=np.random.randint(0, 100000, 100000)))

n = 99900

%timeit df['Force'].lt(n).idxmin()
# 1000 loops, best of 3: 1.57 ms per loop

%timeit df.Force.where(df.Force > n).first_valid_index()
# 100 loops, best of 3: 1.61 ms per loop

%timeit next(idx for idx, x in zip(df.index, df.Force) if x > n)
# 10000 loops, best of 3: 100 µs per loop

Here is an all-pandas solution that I consider a little neater than some of the other answers.这是一个全熊猫解决方案,我认为它比其他一些答案更简洁。 It is also able to handle the corner case where no value of the input series satisfies the condition.它还能够处理输入序列的值不满足条件的极端情况。

def first_index_ordered(mask):
    assert mask.index.is_monotonic_increasing
    assert mask.dtype == bool
    idx_min = mask[mask].index.min()
    return None if pd.isna(idx_min) else idx_min

col = "foo"
thr = 42
mask = df[col] < thr
idx_first = first_index_ordered(mask)

The above assumed that mask has a value-ordered, monotonically increasing index.上面假设mask有一个值有序的、单调递增的索引。 If this is not the case, we have to do a bit more:如果不是这种情况,我们必须做更多的事情:

def first_index_unordered(mask):
    assert mask.dtype == bool
    index = mask.index
    # This creates a RangeIndex, which is monotonic
    mask = mask.reset_index(drop=True)
    idx_min = mask[mask].index.min()
    return None if pd.isna(idx_min) else index[idx_min] 

Of course, we can combine both cases in one function:当然,我们可以将这两种情况组合在一个函数中:

def first_index_where(mask):
    if mask.index.is_monotonic_increasing:
        return first_index_ordered(mask)
    else:
        return first_index_unordered(mask)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在 Python 中更改列中的小数值(例如从 4.1/5 -&gt; 4.1) - Changing fractional values in a column in Python( e.g from 4.1/5 -> 4.1) 在找到第一个&gt; =元素的索引时 - On finding the index of the first >= element 如何使用任何Windows程序例如Python将多个文件中的前n行删除到单个输出文件 - How to remove the first n lines from multiple files to a single output file using any windows program E.g Python 查找以TEXT-###格式的字符串,例如在另一个字符串内的SFS-444 - Finding string in the format of TEXT-###, e.g SFS-444 inside of another string python中的递归,例如insertionsort - Recursion in python, e.g insertionsort Python:我可以将 dataframe 列中标记为“未知”的缺失值替换为 NaN 吗? - Python: Can I replace missing values marked as e.g "Unknown" to NaN in a dataframe column? 有没有办法遍历 excel 列来检查每个值的前一个值是否高 1? 例如 (1, 2, 3, 4, 5) - Is there a way to iterate through an excel column to check that every values' preceding value is higher by 1? E.g (1, 2, 3, 4, 5) 从Rails调用bash命令时,如何设置系统变量(例如$ PYTHONPATH)? - How to set system variable (e.g $PYTHONPATH), when calling bash command from Rails? 在熊猫系列中删除层次结构索引的第一个元素 - Drop first element of hierarchical index in a pandas series 在另一个列表中查找列表的第一个元素的索引 - Finding the index of the first element of a list in another list
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM