Pandas data frame with groupby: How to create indicator variable for the first and last rows in each group

Question

Suppose I have a data frame like this:

and I plan to use it in df.groupby(['X']).apply(function) operation. I want to create additional columns with indicator variables to mark the rows where each group starts and finishes. I want to create a new frame like this (I abbreviated False to F)

     X  First_X  Last_X
0  10  True     F
1  10  F        F
2  10  F        F
3  10  F        True
4  20  True     F
5  20  F        True
6  30  True     F
7  30  F        F
8  30  F        F
9  30  F        True

How would I do it?

The same question in a case where I do groupby operation with two or more columns. For example: df.groupby(['X','Y']).apply(function) . For the second variable, I mark the first and the last row within the group created by the first variable.

and a resulting frame should be

    X    Y   First_X  Last_X  First_Y  Last_Y
0  10    1   True     F       True     F
1  10    1   F        F       F        True
2  10    2   F        F       True     F
2  10    2   F        True    F        True
3  20    3   True     F       True     True
4  20    4   F        True    True     True
5  30    5   True     F       True     F
6  30    5   F        F       F        F
7  30    5   F        F       F        True
8  30    6   F        True    True     True

Is using DataFrame.shift and DataFrame.merge is the right way to approach the problem?

Thank you.

Answer 1

First Question;

df=df.assign(First_X=df.X.ne(df.X.shift()),Last_X=df.X.ne(df.X.shift(-1)))

Second one

print(df3)

    X  Y First_X Last_X
0  10  1    True      F
1  10  1       F      F
2  10  2       F      F
2  10  2       F   True
3  20  3    True      F
4  20  4       F   True
5  30  5    True      F
6  30  5       F      F
7  30  5       F      F
8  30  6       F   True



df3=df3.assign(First_Y=df3.groupby(['X','Y'])['Y']\
    .apply(lambda x: x.ne(x.shift())),Last_Y=df3.groupby\
    (['X','Y'])['Y'].apply(lambda x: x.ne(x.shift(-1))))



    X  Y First_X Last_X  First_Y  Last_Y
0  10  1    True      F     True   False
1  10  1       F      F    False    True
2  10  2       F      F     True   False
2  10  2       F   True    False    True
3  20  3    True      F     True    True
4  20  4       F   True     True    True
5  30  5    True      F     True   False
6  30  5       F      F    False   False
7  30  5       F      F    False    True
8  30  6       F   True     True    True

Answer 2

For the first question, inspired by the similar question here :

df['first'] = False
df['last'] = False

def set_cols(df):
  df['first'].iloc[0] = True
  df['last'].iloc[-1] = True
  return df

df = df.groupby('X').apply(set_cols)

Gives the desired result.

Answer 3

df.assign(
first_ind=lambda df: pd.Series(data=1, index=df.groupby('X')['Y'].idxmin()),
last_ind=lambda df: pd.Series(data=1, index=df.groupby('X')['Y'].idxmax()))

Pandas data frame with groupby: How to create indicator variable for the first and last rows in each group

Question

3 answers

solution1
1 2020-11-15 21:51:10

solution2
0 2020-11-15 23:07:26

solution3
0 2021-10-04 12:22:18

Pandas data frame with groupby: How to create indicator variable for the first and last rows in each group

Question

3 answers

solution1 1 2020-11-15 21:51:10

solution2 0 2020-11-15 23:07:26

solution3 0 2021-10-04 12:22:18

solution1
1 2020-11-15 21:51:10

solution2
0 2020-11-15 23:07:26

solution3
0 2021-10-04 12:22:18