I'm facing bit of an issue adding a new column to my Pandas DataFrame: I have a DataFrame in which each row represents a record of location data and a timestamp. Those records belong to trips, so each row also contains a trip id. Imagine the DataFrame looks kind of like this:
TripID Lat Lon time
0 42 53.55 9.99 74
1 42 53.58 9.99 78
3 42 53.60 9.98 79
6 12 52.01 10.04 64
7 12 52.34 10.05 69
Now I would like to delete the records of all trips that have less than a minimum amount of records to them. I figured I could simply get the number of records of each trip like so:
lengths = df['TripID'].value_counts()
Then my idea was to add an additional column to the DataFrame and fill it with the values from that Series corresponding to the trip id of each record. I would then be able to get rid of all rows in which the value of the length
column is too small.
However, I can't seem to find a way to get the length values into the correct rows. Would any one have an idea for that or even a better approach to the entire problem?
Thanks very much!
EDIT:
My desired output should look something like this:
TripID Lat Lon time length
0 42 53.55 9.99 74 3
1 42 53.58 9.99 78 3
3 42 53.60 9.98 79 3
6 12 52.01 10.04 64 2
7 12 52.34 10.05 69 2
If I understand correctly, to get the length of the trip, you'd want to get the difference between the maximum time and the minimum time for each trip. You can do that with a groupby statement.
# Groupby, get the minimum and maximum times, then reset the index
df_new = df.groupby('TripID').time.agg(['min', 'max']).reset_index()
df_new['length_of_trip'] = df_new.max - df_new.min
df_new = df_new.loc[df_new.length_of_trip > 90] # to pick a random number
That'll get you all the rows with a trip length above the amount you need, including the trip IDs.
您可以使用groupby
和transform
将lengths列直接添加到DataFrame中,如下所示:
df["lengths"] = df[["TripID", "time"]].groupby("TripID").transform("count")
I managed to find an answer to my question that is quite a bit nicer than my original approach as well:
df = df.groupby('TripID').filter(lambda x: len(x) > 2)
This can be found in the Pandas documentation . It gets rid of all groups that have 2 or less elements in them, or trips that are 2 records or shorter in my case.
I hope this will help someone else out as well.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.