pandas dataframe groupby and get nth row

Question

I have a pandas DataFrame like following.

df = pd.DataFrame([[1.1, 1.1, 1.1, 2.6, 2.5, 3.4,2.6,2.6,3.4,3.4,2.6,1.1,1.1,3.3], list('AAABBBBABCBDDD'), [1.1, 1.7, 2.5, 2.6, 3.3, 3.8,4.0,4.2,4.3,4.5,4.6,4.7,4.7,4.8], ['x/y/z','x/y','x/y/z/n','x/u','x','x/u/v','x/y/z','x','x/u/v/b','-','x/y','x/y/z','x','x/u/v/w'],['1','3','3','2','4','2','5','3','6','3','5','1','1','1'],['200','400','404','200','200','404','200','404','500','200','500','200','200','400']]).T

df.columns = ['col1','col2','col3','col4','ID','col5']

I want group this by "ID" and get the 2nd row of each group. Later I will need to get 3rd and 4th also. Just explain me how to get only the 2nd row of each group.

I tried following which gives both first and second.

df.groupby('ID').head(2)

Instead I need to get only the second row. Since ID 4 and 6 has no second rows need to ignore them.

             col1 col2 col3     col4     ID    col5
ID                                           
1       0   1.1     A  1.1    x/y/z       1    200
        11  1.1     D  4.7    x/y/z       1    200
2       3   2.6     B  2.6      x/u       2    200
        5   3.4     B  3.8    x/u/v       2    404
3       1   1.1     A  1.7      x/y       3    400
        2   1.1     A  2.5  x/y/z/n       3    404
4       4   2.5     B  3.3        x       4    200
5       6   2.6     B    4    x/y/z       5    200
        10  2.6     B  4.6      x/y       5    500
6       8   3.4     B  4.3  x/u/v/b       6    500

Answer 1

I think the nth method is supposed to do just that:

In [10]: g = df.groupby('ID')
In [11]: g.nth(1).dropna()
Out[11]: 
    col1 col2  col3     col4 col5
ID                               
1    1.1    D   4.7    x/y/z  200
2    3.4    B   3.8    x/u/v  404
3    1.1    A   2.5  x/y/z/n  404
5    2.6    B   4.6      x/y  500

In 0.13 another way to do this is to use cumcount:

df[g.cumcount() == n - 1]

...which is significantly faster.

In [21]: %timeit g.nth(1).dropna()
100 loops, best of 3: 11.3 ms per loop

In [22]: %timeit df[g.cumcount() == 1]
1000 loops, best of 3: 286 µs per loop

Answer 2

If you use apply on the groupby, the function you pass is called on each group, passed as a DataFrame. So you can do:

df.groupby('ID').apply(lambda t: t.iloc[1])

However, this will raise an error if the group doesn't have at least two rows. If you want to exclude groups with fewer than two rows, that could be trickier. I'm not aware of a way to exclude the result of apply only for certain groups. You could try filtering the group list first by removing small groups, or return a one-row nan -filled DataFrame and do dropna on the result.

pandas dataframe groupby and get nth row

Question

2 answers

solution1
13 ACCPTED 2013-11-20 05:03:10

solution2
1 2013-11-20 05:02:57

pandas dataframe groupby and get nth row

Question

2 answers

solution1 13 ACCPTED 2013-11-20 05:03:10

solution2 1 2013-11-20 05:02:57

solution1
13 ACCPTED 2013-11-20 05:03:10

solution2
1 2013-11-20 05:02:57