![](/img/trans.png)
[英]How to get the first index of a pandas dataframe for which two columns are both not null?
[英]How to get the first index of a pandas DataFrame for which several undefined columns are not null?
我有一個包含多個列的數據框。 我想得到第一行索引:
示例:如果我的數據框是:
Date A B C D
0 2015-01-02 NaN 1 1 NaN
1 2015-01-02 NaN 2 2 NaN
2 2015-01-02 NaN 3 3 NaN
3 2015-01-02 1 NaN 4 NaN
5 2015-01-02 NaN 2 NaN NaN
6 2015-01-03 1 NaN 6 NaN
7 2015-01-03 1 1 6 NaN
8 2015-01-03 1 1 6 8
如果n = 1,我會得到3
如果n = 2,我會得到7
如果n = 3,我會得到8
這里有一種方法可以一次性獲得不同n's
指數 -
def numpy_approach(df, reference='A'):
df0 = df.iloc[:,df.columns != 'Date']
valid_mask = df0.columns != reference
mask = ~np.isnan(df0.values)
count = mask[:,valid_mask].sum(1) * mask[:,(~valid_mask).argmax()]
idx0 = np.searchsorted(np.maximum.accumulate(count),[1,2,3])
return df.index[idx0]
樣品運行 -
In [555]: df
Out[555]:
Date A B C D
0 2015-01-02 NaN 1.0 1.0 NaN
1 2015-01-02 NaN 2.0 2.0 NaN
2 2015-01-02 NaN 3.0 3.0 NaN
3 2015-01-02 1.0 NaN 4.0 NaN
5 2015-01-02 NaN 2.0 NaN NaN
6 2015-01-03 1.0 NaN 6.0 NaN
7 2015-01-03 1.0 1.0 6.0 NaN
8 2015-01-03 1.0 1.0 6.0 8.0
In [556]: numpy_approach(df, reference='A')
Out[556]: Int64Index([3, 7, 8], dtype='int64')
In [557]: numpy_approach(df, reference='B')
Out[557]: Int64Index([0, 7, 8], dtype='int64')
In [558]: numpy_approach(df, reference='C')
Out[558]: Int64Index([0, 7, 8], dtype='int64')
In [568]: numpy_approach(df, reference='D')
Out[568]: Int64Index([8, 8, 8], dtype='int64')
您可以先選擇A
,其中不是NaN
,列是按loc
計算的,然后獲取每行的非notnull
值的sum
,對於列A
使用子數1
。
最后一次使用帶有idxmax
布爾掩碼:
a = df.loc[df['A'].notnull(), 'A':].notnull().sum(axis=1).sub(1)
print (a)
3 1
6 1
7 2
8 3
dtype: int64
N = 1
print ((a == N).idxmax())
3
N = 2
print ((a == N).idxmax())
7
N = 3
print ((a == N).idxmax())
8
print (df.loc[df['A'].notnull(), 'A':])
A B C D
3 1.0 NaN 4.0 NaN
6 1.0 NaN 6.0 NaN
7 1.0 1.0 6.0 NaN
8 1.0 1.0 6.0 8.0
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.