在 Pandas 数据框中查找具有 NaN 的行的整数索引

Question

I have a pandas DataFrame like this:我有一个像这样的熊猫数据帧：

                    a         b
2011-01-01 00:00:00 1.883381  -0.416629
2011-01-01 01:00:00 0.149948  -1.782170
2011-01-01 02:00:00 -0.407604 0.314168
2011-01-01 03:00:00 1.452354  NaN
2011-01-01 04:00:00 -1.224869 -0.947457
2011-01-01 05:00:00 0.498326  0.070416
2011-01-01 06:00:00 0.401665  NaN
2011-01-01 07:00:00 -0.019766 0.533641
2011-01-01 08:00:00 -1.101303 -1.408561
2011-01-01 09:00:00 1.671795  -0.764629

Is there an efficient way to find the "integer" index of rows with NaNs?有没有一种有效的方法可以找到带有 NaN 的行的“整数”索引？ In this case the desired output should be [3, 6] .在这种情况下，所需的输出应该是[3, 6] 。

Answer 1

Here is a simpler solution:这是一个更简单的解决方案：

inds = pd.isnull(df).any(1).nonzero()[0]

In [9]: df
Out[9]: 
          0         1
0  0.450319  0.062595
1 -0.673058  0.156073
2 -0.871179 -0.118575
3  0.594188       NaN
4 -1.017903 -0.484744
5  0.860375  0.239265
6 -0.640070       NaN
7 -0.535802  1.632932
8  0.876523 -0.153634
9 -0.686914  0.131185

In [10]: pd.isnull(df).any(1).nonzero()[0]
Out[10]: array([3, 6])

Answer 2

For DataFrame df :对于数据帧df ：

import numpy as np
index = df['b'].index[df['b'].apply(np.isnan)]

will give you back the MultiIndex that you can use to index back into df , eg:会给你多MultiIndex ，你可以用它来索引回df ，例如：

df['a'].ix[index[0]]
>>> 1.452354

For the integer index:对于整数索引：

df_index = df.index.values.tolist()
[df_index.index(i) for i in index]
>>> [3, 6]

Answer 3

One line solution.一行解决。 However it works for one column only.但是它只适用于一列。

df.loc[pandas.isna(df["b"]), :].index

Answer 4

And just in case, if you want to find the coordinates of 'nan' for all the columns instead (supposing they are all numericals), here you go:以防万一，如果您想为所有列找到“nan”的坐标（假设它们都是数字），请执行以下操作：

df = pd.DataFrame([[0,1,3,4,np.nan,2],[3,5,6,np.nan,3,3]])

df
   0  1  2    3    4  5
0  0  1  3  4.0  NaN  2
1  3  5  6  NaN  3.0  3

np.where(np.asanyarray(np.isnan(df)))
(array([0, 1]), array([4, 3]))

Answer 5

不知道这是否为时已晚，但您可以使用 np.where 来查找非值的索引：

indices = list(np.where(df['b'].isna()[0]))

Answer 6

如果您有日期时间索引并且您想拥有以下值：

df.loc[pd.isnull(df).any(1), :].index.values

Answer 7

Here are tests for a few methods:以下是几种方法的测试：

%timeit np.where(np.isnan(df['b']))[0]
%timeit pd.isnull(df['b']).nonzero()[0]
%timeit np.where(df['b'].isna())[0]
%timeit df.loc[pd.isna(df['b']), :].index

And their corresponding timings:以及它们对应的时间：

333 µs ± 9.95 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
280 µs ± 220 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
313 µs ± 128 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
6.84 ms ± 1.59 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

It would appear that pd.isnull(df['DRGWeight']).nonzero()[0] wins the day in terms of timing, but that any of the top three methods have comparable performance.看起来pd.isnull(df['DRGWeight']).nonzero()[0]在时间方面获胜，但前三种方法中的任何一种都具有相当的性能。

Answer 8

另一个简单的解决方案是list(np.where(df['b'].isnull())[0])

Answer 9

Here is another simpler take:这是另一个更简单的方法：

df = pd.DataFrame([[0,1,3,4,np.nan,2],[3,5,6,np.nan,3,3]])

inds = np.asarray(df.isnull()).nonzero()

(array([0, 1], dtype=int64), array([4, 3], dtype=int64))

Answer 10

I was looking for all indexes of rows with NaN values.我正在寻找具有 NaN 值的行的所有索引。
My working solution:我的工作解决方案：

def get_nan_indexes(data_frame):
    indexes = []
    print(data_frame)
    for column in data_frame:
        index = data_frame[column].index[data_frame[column].apply(np.isnan)]
        if len(index):
            indexes.append(index[0])
    df_index = data_frame.index.values.tolist()
    return [df_index.index(i) for i in set(indexes)]

Answer 11

这将为您提供每列中 nan 的索引值：

df.loc[pd.isna(df).any(1), :].index

Answer 12

Let the dataframe be named df and the column of interest(ie the column in which we are trying to find nulls ) is 'b' .将数据框命名为df ，感兴趣的列（即我们试图在其中查找空值的列）为'b' 。 Then the following snippet gives the desired index of null in the dataframe:然后以下代码段给出了数据帧中所需的 null 索引：

   for i in range(df.shape[0]):
       if df['b'].isnull().iloc[i]:
           print(i)

Answer 13

    index_nan = []
        for index, bool_v in df["b"].iteritems().isna():
           if bool_v == True:
               index_nan.append(index)
    print(index_nan)

在 Pandas 数据框中查找具有 NaN 的行的整数索引

问题描述

13 个解决方案

解决方案1
152 2012-12-25 18:41:23

解决方案2
53 已采纳 2012-12-24 03:02:12

解决方案3
17 2019-01-15 11:23:19

解决方案4
12 2017-09-07 14:49:25

解决方案5
9 2018-09-11 13:07:05

解决方案6
5 2019-05-03 21:34:42

解决方案7
5 2019-08-28 18:02:08

解决方案8
3 2019-12-18 15:03:36

解决方案9
1 2018-05-03 17:14:50

解决方案10
1 2018-10-04 15:20:54

解决方案11
1 2021-06-16 03:12:47

解决方案12
0 2019-05-20 11:33:45

解决方案13
0 2021-12-02 11:27:12

在 Pandas 数据框中查找具有 NaN 的行的整数索引

问题描述

13 个解决方案

解决方案1 152 2012-12-25 18:41:23

解决方案2 53 已采纳 2012-12-24 03:02:12

解决方案3 17 2019-01-15 11:23:19

解决方案4 12 2017-09-07 14:49:25

解决方案5 9 2018-09-11 13:07:05

解决方案6 5 2019-05-03 21:34:42

解决方案7 5 2019-08-28 18:02:08

解决方案8 3 2019-12-18 15:03:36

解决方案9 1 2018-05-03 17:14:50

解决方案10 1 2018-10-04 15:20:54

解决方案11 1 2021-06-16 03:12:47

解决方案12 0 2019-05-20 11:33:45

解决方案13 0 2021-12-02 11:27:12

解决方案1
152 2012-12-25 18:41:23

解决方案2
53 已采纳 2012-12-24 03:02:12

解决方案3
17 2019-01-15 11:23:19

解决方案4
12 2017-09-07 14:49:25

解决方案5
9 2018-09-11 13:07:05

解决方案6
5 2019-05-03 21:34:42

解决方案7
5 2019-08-28 18:02:08

解决方案8
3 2019-12-18 15:03:36

解决方案9
1 2018-05-03 17:14:50

解决方案10
1 2018-10-04 15:20:54

解决方案11
1 2021-06-16 03:12:47

解决方案12
0 2019-05-20 11:33:45

解决方案13
0 2021-12-02 11:27:12