[英]How to get the first index of a pandas DataFrame for which several undefined columns are not null?
[英]How to get the first index of a pandas dataframe for which two columns are both not null?
我有幾列的數據框。 我想獲取兩列的值都不是NaN的第一行索引。 我知道我需要使用df.first_valid_index()
例如:如果我的數據框是:
Date A B
0 2015-01-02 NaN 1
1 2015-01-02 NaN 2
2 2015-01-02 NaN 3
3 2015-01-02 1 NaN
5 2015-01-02 NaN 2
7 2015-01-03 1 1
我會得到7
一種方法-
(~(pd.isnull(df.A) | pd.isnull(df.B))).idxmax()
類似的-
(~pd.isnull(df[['A','B']]).any(1)).idxmax()
為了提高性能,我們可能要使用argmax
(~pd.isnull(df[['A','B']]).any(1)).argmax()
僅考慮性能,我們可以引入更多NumPy-
df.index[(~(np.isnan(df.A.values) | np.isnan(df.B.values))).argmax()]
樣品運行-
In [172]: df
Out[172]:
Date A B
0 2015-01-02 NaN 1.0
1 2015-01-02 NaN 2.0
2 2015-01-02 NaN 3.0
3 2015-01-02 1.0 NaN
5 2015-01-02 NaN 2.0
7 2015-01-03 1.0 1.0
In [173]: (~(pd.isnull(df.A) | pd.isnull(df.B))).idxmax()
Out[173]: 7
In [180]: (~pd.isnull(df[['A','B']]).any(1)).idxmax()
Out[180]: 7
In [182]: (~pd.isnull(df[['A','B']]).any(1)).argmax()
Out[182]: 7
In [258]: df.index[(~(np.isnan(df.A.values) | np.isnan(df.B.values))).argmax()]
Out[258]: 7
運行時測試-
In [259]: a = np.random.rand(100000,2)
In [260]: a[np.random.rand(*a.shape)>0.2] = np.nan
In [261]: df = pd.DataFrame(a, columns=list(('A','B')))
# @jezrael's soln
In [262]: %timeit df[['A','B']].notnull().all(axis=1).idxmax()
100 loops, best of 3: 4.91 ms per loop
In [263]: %timeit (~(pd.isnull(df.A) | pd.isnull(df.B))).idxmax()
...: %timeit (~pd.isnull(df[['A','B']]).any(1)).idxmax()
...: %timeit (~pd.isnull(df[['A','B']]).any(1)).argmax()
...:
1000 loops, best of 3: 1.37 ms per loop
100 loops, best of 3: 4.73 ms per loop
100 loops, best of 3: 4.74 ms per loop
In [264]: %timeit df.index[(~(np.isnan(df.A.values) | np.isnan(df.B.values))).argmax()]
10000 loops, best of 3: 169 µs per loop
使用notnull
與all
來檢查每行和idxmax
所有True
值:
print (df[['A','B']].notnull())
A B
0 False True
1 False True
2 False True
3 True False
5 False True
7 True True
print (df[['A','B']].notnull().all(axis=1))
0 False
1 False
2 False
3 False
5 False
7 True
dtype: bool
val = df[['A','B']].notnull().all(axis=1).idxmax()
print (val)
7
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.