简体   繁体   中英

Add value from series index to row of equal value in Pandas DataFrame

I'm facing bit of an issue adding a new column to my Pandas DataFrame: I have a DataFrame in which each row represents a record of location data and a timestamp. Those records belong to trips, so each row also contains a trip id. Imagine the DataFrame looks kind of like this:

   TripID  Lat    Lon    time
0  42      53.55  9.99   74
1  42      53.58  9.99   78
3  42      53.60  9.98   79
6  12      52.01  10.04  64
7  12      52.34  10.05  69

Now I would like to delete the records of all trips that have less than a minimum amount of records to them. I figured I could simply get the number of records of each trip like so:

 lengths = df['TripID'].value_counts()

Then my idea was to add an additional column to the DataFrame and fill it with the values from that Series corresponding to the trip id of each record. I would then be able to get rid of all rows in which the value of the length column is too small.

However, I can't seem to find a way to get the length values into the correct rows. Would any one have an idea for that or even a better approach to the entire problem?

Thanks very much!

EDIT:

My desired output should look something like this:

   TripID  Lat    Lon    time  length
0  42      53.55  9.99   74    3
1  42      53.58  9.99   78    3
3  42      53.60  9.98   79    3
6  12      52.01  10.04  64    2
7  12      52.34  10.05  69    2

If I understand correctly, to get the length of the trip, you'd want to get the difference between the maximum time and the minimum time for each trip. You can do that with a groupby statement.

# Groupby, get the minimum and maximum times, then reset the index
df_new = df.groupby('TripID').time.agg(['min', 'max']).reset_index()
df_new['length_of_trip'] = df_new.max - df_new.min
df_new = df_new.loc[df_new.length_of_trip > 90] # to pick a random number

That'll get you all the rows with a trip length above the amount you need, including the trip IDs.

您可以使用groupbytransform将lengths列直接添加到DataFrame中,如下所示:

df["lengths"] = df[["TripID", "time"]].groupby("TripID").transform("count")

I managed to find an answer to my question that is quite a bit nicer than my original approach as well:

df = df.groupby('TripID').filter(lambda x: len(x) > 2)

This can be found in the Pandas documentation . It gets rid of all groups that have 2 or less elements in them, or trips that are 2 records or shorter in my case.

I hope this will help someone else out as well.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM