简体   繁体   中英

Loop Through Days of the Week in Pandas Dataframe

I have a Pandas DataFrame with a start column of dtype of datetime64[ns, UTC] and the DataFrame is sorted in ascending order based on the start column. From this DataFrame I used the following to create a new (updated) DataFrame indicating the day of the week for the start column

format_datetime_df['day_of_week'] = format_datetime_df['start'].dt.dayofweek

I want to pass the DataFrame into a function. The function needs to loop through the days of the week, so from 0 to 6, and keep a running total of the distance (kept in column 'distance') covered. If the distance covered is greater than 15, then a counter is incremented. It needs to do this for all rows of the DataFrame. The return of the function will be the total number of weeks over 15.

I am getting stuck on how to implement this as my 'day_of_week' column starts as follows

3
3
5
1
5

So, week 1 would be comprised of 3, 3, 5 and week 2 would be comprised of 1, 5, ...

I want to do something like

number_of_weeks_over_10km = format_datetime_df.groupby().apply(weeks_over_10km)

but am not really sure what should go in the groupby() function. I also feel like I am overcomplicating this.

It was complicated, but I figured it out. Here is the basic flow of what I did

# Create a helper index that allows iteration by week while also considering the year

# Function to return the total distance for each week

# Create a NumPy array to store the total distance for each week

# Append the total distance for each week to the array

# Count the number of times the total distance for each week was > x (in km)

The helper index that allowed for iteration by week while also considering the year came from another post here on Stack Overflow ( Iterate over pd df with date column by week python ). This had a consequence though, in that I had to create and append the NumPy array outside of the function in order to get everything to work.

I guess you can solve that using Pandas without functions. Just determine year and week using

df["isoweek"] = (df["start"].dt.isocalendar()["year"].astype(str)
 + " "
 + df["start"].dt.isocalendar()["week"].astype(str)
)

Then you determine the distance using a groupby and count the entries above 15:

weeks_above_15 = (df.groupby("isoweek")["distance"].sum() > 15).sum()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM