Pandas dataframe get smallest NaN and smallest not NaN row for each group

Question

Pandas dataframe get smallest NaN and smallest not NaN row for each group

Below is my dataframe:

df = pd.DataFrame({'id' : [1,1,1,1,1,2,2,2,2,2,3,3,3,3,3],
                   'vd' : [1, 2, 3, 4, 5, 2, 3, 4, 5, 6,
                           3, 4, 5, 6, 7],
            'value'  : [np.NaN, np.NaN, np.NaN, 2.3, 1.1,
                        np.NaN, np.NaN, 2.2, 3.3, 2.2,
                        np.NaN, 1, 2, 2.3, 1.1]})

I wish to filter row with smallest 'vd' column and NaN in 'value' for each group & smallest vd column and not Nan in 'value' column.

Expected output:

df_out = pd.DataFrame({'id' : [1,1, 2,2, 3,3],
                   'vd' : [1, 4, 2, 4,
                           3, 4],
            'value'  : [np.NaN, 2.3,
                        np.NaN, 2.2,
                        np.NaN, 1]})

Approach:

out = df[df.index == (df['value'].notnull().groupby(df['id']).transform('idxmax'))]

Issue: Below(idxmin is not working)

df['value'].isnull().groupby(df['id']).transform('idxmin')

Answer 1

Example

df = pd.DataFrame({'id' : [1,1,1,1,1,2,2,2,2,2,3,3,3,3,3],
                   'vd' : [1, 2, 3, 4, 5, 2, 3, 4, 5, 6,
                           3, 4, 5, 6, 7],
            'value'  : [np.NaN, np.NaN, np.NaN, 2.3, 1.1,
                        np.NaN, np.NaN, 2.2, 3.3, 2.2,
                        np.NaN, 1, 2, 2.3, 1.1]})

df

    id  vd  value
0   1   1   NaN
1   1   2   NaN
2   1   3   NaN
3   1   4   2.3
4   1   5   1.1
5   2   2   NaN
6   2   3   NaN
7   2   4   2.2
8   2   5   3.3
9   2   6   2.2
10  3   3   NaN
11  3   4   1.0
12  3   5   2.0
13  3   6   2.3
14  3   7   1.1

Code

If sorted by id and vd like your example, following code is most concise

out = df.groupby(['id', df['value'].isna()]).head(1).reset_index(drop=True)

out

    id  vd  value
0   1   1   NaN
1   1   4   2.3
2   2   2   NaN
3   2   4   2.2
4   3   3   NaN
5   3   4   1.0

Answer 2

You can also use drop_duplicates without grouping:

df.assign(blank = df['value'].isna()).drop_duplicates(['id','blank']).drop('blank', axis=1)

Output:

    id  vd  value
0    1   1    NaN
3    1   4    2.3
5    2   2    NaN
7    2   4    2.2
10   3   3    NaN
11   3   4    1.0

Pandas dataframe get smallest NaN and smallest not NaN row for each group

Question

2 answers

solution1
3 2022-12-27 03:07:49

solution2
0 2022-12-27 03:51:28

Pandas dataframe get smallest NaN and smallest not NaN row for each group

Question

2 answers

solution1 3 2022-12-27 03:07:49

solution2 0 2022-12-27 03:51:28

solution1
3 2022-12-27 03:07:49

solution2
0 2022-12-27 03:51:28