Suppose I have a data frame like this:
X
0 10
1 10
2 10
3 10
4 20
5 20
6 30
7 30
8 30
9 30
and I plan to use it in df.groupby(['X']).apply(function)
operation. I want to create additional columns with indicator variables to mark the rows where each group starts and finishes. I want to create a new frame like this (I abbreviated False to F)
X First_X Last_X
0 10 True F
1 10 F F
2 10 F F
3 10 F True
4 20 True F
5 20 F True
6 30 True F
7 30 F F
8 30 F F
9 30 F True
How would I do it?
The same question in a case where I do groupby operation with two or more columns. For example: df.groupby(['X','Y']).apply(function)
. For the second variable, I mark the first and the last row within the group created by the first variable.
X Y
0 10 1
1 10 1
2 10 2
3 10 2
4 20 3
5 20 4
6 30 5
7 30 5
8 30 5
9 30 6
and a resulting frame should be
X Y First_X Last_X First_Y Last_Y
0 10 1 True F True F
1 10 1 F F F True
2 10 2 F F True F
2 10 2 F True F True
3 20 3 True F True True
4 20 4 F True True True
5 30 5 True F True F
6 30 5 F F F F
7 30 5 F F F True
8 30 6 F True True True
Is using DataFrame.shift
and DataFrame.merge
is the right way to approach the problem?
Thank you.
First Question;
df=df.assign(First_X=df.X.ne(df.X.shift()),Last_X=df.X.ne(df.X.shift(-1)))
Second one
print(df3)
X Y First_X Last_X
0 10 1 True F
1 10 1 F F
2 10 2 F F
2 10 2 F True
3 20 3 True F
4 20 4 F True
5 30 5 True F
6 30 5 F F
7 30 5 F F
8 30 6 F True
df3=df3.assign(First_Y=df3.groupby(['X','Y'])['Y']\
.apply(lambda x: x.ne(x.shift())),Last_Y=df3.groupby\
(['X','Y'])['Y'].apply(lambda x: x.ne(x.shift(-1))))
X Y First_X Last_X First_Y Last_Y
0 10 1 True F True False
1 10 1 F F False True
2 10 2 F F True False
2 10 2 F True False True
3 20 3 True F True True
4 20 4 F True True True
5 30 5 True F True False
6 30 5 F F False False
7 30 5 F F False True
8 30 6 F True True True
For the first question, inspired by the similar question here :
df['first'] = False
df['last'] = False
def set_cols(df):
df['first'].iloc[0] = True
df['last'].iloc[-1] = True
return df
df = df.groupby('X').apply(set_cols)
Gives the desired result.
df.assign(
first_ind=lambda df: pd.Series(data=1, index=df.groupby('X')['Y'].idxmin()),
last_ind=lambda df: pd.Series(data=1, index=df.groupby('X')['Y'].idxmax()))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.