简体   繁体   中英

pandas groupby, keep only rows with first occurrence

This is a slow solution for what I am hoping to achieve. The problem is performance. Is there a more 'pandonic' way to achieve this without the user defined function? The goal is to keep only all rows that are of the first timestamp that occurs in each group.

def get_first_id_time(df):
    first_time = df['datetime'][0]
    df = df.loc[df['datetime']==first_time]

    return df

data = data.groupby('id').apply(get_first_id_time)

EDIT: Note, there are many rows with datetime=first_time, for each group.

Can you just get the min datetime and merge ?

min_datetime = data.groupby('id')['datetime'].min().reset_index()

data = data.merge(min_datetime, how='inner', on='id')

Edit:

Since there are many rows that have the same first_datetime , just merge on both datetime and id .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM