I have a Pandas DataFrame with a start column of dtype of datetime64[ns, UTC] and the DataFrame is sorted in ascending order based on the start column. From this DataFrame I used the following to create a new (updated) DataFrame indicating the day of the week for the start column
format_datetime_df['day_of_week'] = format_datetime_df['start'].dt.dayofweek
I want to pass the DataFrame into a function. The function needs to loop through the days of the week, so from 0 to 6, and keep a running total of the distance (kept in column 'distance') covered. If the distance covered is greater than 15, then a counter is incremented. It needs to do this for all rows of the DataFrame. The return of the function will be the total number of weeks over 15.
I am getting stuck on how to implement this as my 'day_of_week' column starts as follows
3
3
5
1
5
So, week 1 would be comprised of 3, 3, 5 and week 2 would be comprised of 1, 5, ...
I want to do something like
number_of_weeks_over_10km = format_datetime_df.groupby().apply(weeks_over_10km)
but am not really sure what should go in the groupby() function. I also feel like I am overcomplicating this.
It was complicated, but I figured it out. Here is the basic flow of what I did
# Create a helper index that allows iteration by week while also considering the year
# Function to return the total distance for each week
# Create a NumPy array to store the total distance for each week
# Append the total distance for each week to the array
# Count the number of times the total distance for each week was > x (in km)
The helper index that allowed for iteration by week while also considering the year came from another post here on Stack Overflow ( Iterate over pd df with date column by week python ). This had a consequence though, in that I had to create and append the NumPy array outside of the function in order to get everything to work.
I guess you can solve that using Pandas without functions. Just determine year and week using
df["isoweek"] = (df["start"].dt.isocalendar()["year"].astype(str)
+ " "
+ df["start"].dt.isocalendar()["week"].astype(str)
)
Then you determine the distance using a groupby and count the entries above 15:
weeks_above_15 = (df.groupby("isoweek")["distance"].sum() > 15).sum()
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.