获取 pandas boolean 系列为 True 的索引列表

Question

I have a pandas series with boolean entries.我有一个带有 boolean 条目的 pandas 系列。 I would like to get a list of indices where the values are True .我想获取值为True的索引列表。

For example the input pd.Series([True, False, True, True, False, False, False, True])例如输入pd.Series([True, False, True, True, False, False, False, True])

should yield the output [0,2,3,7] .应该产生 output [0,2,3,7] 。

I can do it with a list comprehension, but is there something cleaner or faster?我可以通过列表理解来做到这一点，但是有没有更干净或更快的东西？

Answer 1

Using `Boolean Indexing`使用`Boolean Indexing`

>>> s = pd.Series([True, False, True, True, False, False, False, True])
>>> s[s].index
Int64Index([0, 2, 3, 7], dtype='int64')

If need a np.array object, get the .values如果需要np.array对象，请获取.values

>>> s[s].index.values
array([0, 2, 3, 7])

Using `np.nonzero`使用`np.nonzero`

>>> np.nonzero(s)
(array([0, 2, 3, 7]),)

Using `np.flatnonzero`使用`np.flatnonzero`

>>> np.flatnonzero(s)
array([0, 2, 3, 7])

Using `np.where`使用`np.where`

>>> np.where(s)[0]
array([0, 2, 3, 7])

Using `np.argwhere`使用`np.argwhere`

>>> np.argwhere(s).ravel()
array([0, 2, 3, 7])

Using `pd.Series.index`使用`pd.Series.index`

>>> s.index[s]
array([0, 2, 3, 7])

Using python's built-in `filter`使用python的内置`filter`

>>> [*filter(s.get, s.index)]
[0, 2, 3, 7]

Using `list comprehension`使用`list comprehension`

>>> [i for i in s.index if s[i]]
[0, 2, 3, 7]

Answer 2

As an addition to rafaelc's answer , here are the according times (from quickest to slowest) for the following setup作为rafaelc 答案的补充，以下是以下设置的相应时间（从最快到最慢）

import numpy as np
import pandas as pd
s = pd.Series([x > 0.5 for x in np.random.random(size=1000)])

Using `np.where`使用`np.where`

>>> timeit np.where(s)[0]
12.7 µs ± 77.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

Using `np.flatnonzero`使用`np.flatnonzero`

>>> timeit np.flatnonzero(s)
18 µs ± 508 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

Using `pd.Series.index`使用`pd.Series.index`

The time difference to boolean indexing was really surprising to me, since the boolean indexing is usually more used.布尔索引的时间差让我感到非常惊讶，因为布尔索引通常被更多地使用。

>>> timeit s.index[s]
82.2 µs ± 38.9 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Using `Boolean Indexing`使用`Boolean Indexing`

>>> timeit s[s].index
1.75 ms ± 2.16 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

If you need a np.array object, get the .values如果您需要np.array对象，请获取.values

>>> timeit s[s].index.values
1.76 ms ± 3.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

If you need a slightly easier to read version <-- not in original answer如果您需要一个更容易阅读的版本 <-- 不在原始答案中

>>> timeit s[s==True].index
1.89 ms ± 3.52 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Using `pd.Series.where` <-- not in original answer使用`pd.Series.where` <-- 不在原始答案中

>>> timeit s.where(s).dropna().index
2.22 ms ± 3.32 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

>>> timeit s.where(s == True).dropna().index
2.37 ms ± 2.19 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Using `pd.Series.mask` <-- not in original answer使用`pd.Series.mask` <-- 不在原始答案中

>>> timeit s.mask(s).dropna().index
2.29 ms ± 1.43 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

>>> timeit s.mask(s == True).dropna().index
2.44 ms ± 5.82 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Using `list comprehension`使用`list comprehension`

>>> timeit [i for i in s.index if s[i]]
13.7 ms ± 40.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Using python's built-in `filter`使用python的内置`filter`

>>> timeit [*filter(s.get, s.index)]
14.2 ms ± 28.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Using `np.nonzero` <-- did not work out of the box for me使用`np.nonzero` <-- 对我来说开箱即用

>>> timeit np.nonzero(s)
ValueError: Length of passed values is 1, index implies 1000.

Using `np.argwhere` <-- did not work out of the box for me使用`np.argwhere` <-- 对我来说开箱即用

>>> timeit np.argwhere(s).ravel()
ValueError: Length of passed values is 1, index implies 1000.

Answer 3

Also works: s.where(lambda x: x).dropna().index , and it has the advantage of being easy to chain pipe - if your series is being computed on the fly, you don't need to assign it to a variable.也适用： s.where(lambda x: x).dropna().index ，它的优点是易于链接管道 - 如果您的系列是动态计算的，则无需将其分配给一个变量。

Note that if s is computed from r : s = cond(r) than you can also use: r.where(lambda x: cond(x)).dropna().index .请注意，如果s是从r计算的： s = cond(r) ，那么您还可以使用： r.where(lambda x: cond(x)).dropna().index 。

Answer 4

You can use pipe to chain the operation, this is helpful when s is an intermediate result and you don't want to name it.您可以使用pipe链接操作，这在s是中间结果并且您不想命名时很有帮助。

s = pd.Series([True, False, True, True, False, False, False, True], index=list('ABCDEFGH'))

out = s.pipe(lambda s_: s_[s_].index)

print(out)

Index(['A', 'C', 'D', 'H'], dtype='object')

获取 pandas boolean 系列为 True 的索引列表

问题描述

4 个解决方案

解决方案1
93 2018-09-04 19:53:45

Using `Boolean Indexing`使用`Boolean Indexing`

Using `np.nonzero`使用`np.nonzero`

Using `np.flatnonzero`使用`np.flatnonzero`

Using `np.where`使用`np.where`

Using `np.argwhere`使用`np.argwhere`

Using `pd.Series.index`使用`pd.Series.index`

Using python's built-in `filter`使用python的内置`filter`

Using `list comprehension`使用`list comprehension`

解决方案2
15 2020-08-08 10:12:38

Using `np.where`使用`np.where`

Using `np.flatnonzero`使用`np.flatnonzero`

Using `pd.Series.index`使用`pd.Series.index`

Using `Boolean Indexing`使用`Boolean Indexing`

Using `pd.Series.where` <-- not in original answer使用`pd.Series.where` <-- 不在原始答案中

Using `pd.Series.mask` <-- not in original answer使用`pd.Series.mask` <-- 不在原始答案中

Using `list comprehension`使用`list comprehension`

Using python's built-in `filter`使用python的内置`filter`

Using `np.nonzero` <-- did not work out of the box for me使用`np.nonzero` <-- 对我来说开箱即用

Using `np.argwhere` <-- did not work out of the box for me使用`np.argwhere` <-- 对我来说开箱即用

解决方案3
0 2021-11-20 22:20:50

解决方案4
0 2022-09-24 12:01:51

获取 pandas boolean 系列为 True 的索引列表

问题描述

4 个解决方案

解决方案1 93 2018-09-04 19:53:45

Using Boolean Indexing使用Boolean Indexing

Using np.nonzero使用np.nonzero

Using np.flatnonzero使用np.flatnonzero

Using np.where使用np.where

Using np.argwhere使用np.argwhere

Using pd.Series.index使用pd.Series.index

Using python's built-in filter使用python的内置filter

Using list comprehension使用list comprehension

解决方案2 15 2020-08-08 10:12:38

Using np.where使用np.where

Using np.flatnonzero使用np.flatnonzero

Using pd.Series.index使用pd.Series.index

Using Boolean Indexing使用Boolean Indexing

Using pd.Series.where <-- not in original answer使用pd.Series.where <-- 不在原始答案中

Using pd.Series.mask <-- not in original answer使用pd.Series.mask <-- 不在原始答案中

Using list comprehension使用list comprehension

Using python's built-in filter使用python的内置filter

Using np.nonzero <-- did not work out of the box for me使用np.nonzero <-- 对我来说开箱即用

Using np.argwhere <-- did not work out of the box for me使用np.argwhere <-- 对我来说开箱即用

解决方案3 0 2021-11-20 22:20:50

解决方案4 0 2022-09-24 12:01:51

解决方案1
93 2018-09-04 19:53:45

Using `Boolean Indexing`使用`Boolean Indexing`

Using `np.nonzero`使用`np.nonzero`

Using `np.flatnonzero`使用`np.flatnonzero`

Using `np.where`使用`np.where`

Using `np.argwhere`使用`np.argwhere`

Using `pd.Series.index`使用`pd.Series.index`

Using python's built-in `filter`使用python的内置`filter`

Using `list comprehension`使用`list comprehension`

解决方案2
15 2020-08-08 10:12:38

Using `np.where`使用`np.where`

Using `np.flatnonzero`使用`np.flatnonzero`

Using `pd.Series.index`使用`pd.Series.index`

Using `Boolean Indexing`使用`Boolean Indexing`

Using `pd.Series.where` <-- not in original answer使用`pd.Series.where` <-- 不在原始答案中

Using `pd.Series.mask` <-- not in original answer使用`pd.Series.mask` <-- 不在原始答案中

Using `list comprehension`使用`list comprehension`

Using python's built-in `filter`使用python的内置`filter`

Using `np.nonzero` <-- did not work out of the box for me使用`np.nonzero` <-- 对我来说开箱即用

Using `np.argwhere` <-- did not work out of the box for me使用`np.argwhere` <-- 对我来说开箱即用

解决方案3
0 2021-11-20 22:20:50

解决方案4
0 2022-09-24 12:01:51