简体   繁体   English

查找非NaN列中第一个和/或最后一个值的索引

[英]Find index of the first and/or last value in a column that is not NaN

I am dealing with sub-surface measurements from a borehole where each measurement type covers a different range of depths. 我正在处理来自钻孔的地下测量,其中每种测量类型涵盖不同的深度范围。 Depth is being used as the index in this case. 在这种情况下,深度被用作索引。

I need to find the depth (index) of the first and/or last occurrence of data (non-NaN value) for each measurement type. 我需要找到每种测量类型的第一次和/或最后一次出现的数据(非NaN值)的深度(索引)。

Getting the depth (index) of the first or last row of the dataframe is easy: df.index[0] or df.index[-1] . 获取数据帧的第一行或最后一行的深度(索引)很简单: df.index[0]df.index[-1] The trick is in finding the index of the first or last non-NaN occurrence of any given column. 诀窍是找到任何给定列的第一个或最后一个非NaN出现的索引。

df = pd.DataFrame([[500, np.NaN, np.NaN,     25],
                   [501, np.NaN, np.NaN,     27],
                   [502, np.NaN,     33,     24],
                   [503,      4,     32,     18],
                   [504,     12,     45,      5],
                   [505,      8,     38, np.NaN]])
df.columns = ['Depth','x1','x2','x3']
df.set_index('Depth')

在此输入图像描述

The ideal solution would produce an index (depth) of 503 for the first occurrence of x1, 502 for the first occurrence of x2, and 504 for the last occurrence of x3. 理想的解决方案将为第一次出现的x1产生503的索引(深度),对于第一次出现的x2产生502,对于最后出现的x3产生504。

You can agg : 您可以agg

df.notna().agg({'x1':'idxmax','x2':'idxmax','x3':lambda x: x[::-1].idxmax()})
#df.notna().agg({'x1':'idxmax','x2':'idxmax','x3':lambda x: x[x].last_valid_index()})

x1    503
x2    502
x3    504

Another way would be to check if first row is nan and according to that apply the condition: 另一种方法是检查第一行是否为nan并根据应用条件:

np.where(df.iloc[0].isna(),df.notna().idxmax(),df.notna()[::-1].idxmax())

[503, 502, 504]

first_valid_index () and last_valid_index() can be used. 可以使用first_valid_index ()和last_valid_index()。

    >>> df
             x1    x2    x3
    Depth
    500     NaN   NaN  25.0
    501     NaN   NaN  27.0
    502     NaN  33.0  24.0
    503     4.0  32.0  18.0
    504    12.0  45.0   5.0
    505     8.0  38.0   NaN
    >>> df["x1"].first_valid_index()
    503
    >>> df["x2"].first_valid_index()
    502
    >>> df["x3"].first_valid_index()
    500
    >>> df["x3"].last_valid_index()
    504

IIUC IIUC

df.stack().groupby(level=1).head(1)
Out[619]: 
Depth    
500    x3    25.0
502    x2    33.0
503    x1     4.0
dtype: float64

Let's try this, if I understand you correctly: 如果我理解正确的话,让我们试试吧:

pd.concat([df.apply(pd.Series.first_valid_index),
           df.apply(pd.Series.last_valid_index)], 
           axis=1, 
           keys=['Min_Depth', 'Max_Depth'])

Output: 输出:

      Min_Depth   Max_Depth
x1          503         505
x2          502         505
x3          500         504

Or Transpose output: 或转置输出:

pd.concat([df.apply(pd.Series.first_valid_index),
           df.apply(pd.Series.last_valid_index)], 
           axis=1, 
           keys=['Min_Depth', 'Max_Depth']).T

Output: 输出:

            x1   x2   x3
Min_Depth  503  502  500
Max_Depth  505  505  504

Using apply with a list of func: 使用带有func列表的apply:

df.apply([pd.Series.first_valid_index, pd.Series.last_valid_index])

Output: 输出:

                    x1   x2   x3
first_valid_index  503  502  500
last_valid_index   505  505  504

With a little renaming: 稍加重命名:

df.apply([pd.Series.first_valid_index, pd.Series.last_valid_index])\
  .set_axis(['Min_Depth', 'Max_Depth'], axis=0, inplace=False)

Output: 输出:

            x1   x2   x3
Min_Depth  503  502  500
Max_Depth  505  505  504

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM