I have delay data for 105 units. I need the data to show the Top 10 largest delays per unit.
I need it to show 3 columns Unit, DelayDesc, and Time_hrs
and each unit needs to show only the top 10 DelayDesc
and the hours for those 10 largest delays
At the moment I can only get each unit separated by all its delays using this python code:
Duration_Hr = df['Duration_s']/3600 # calulation from seconds to hours
df['Duration_Hr'] = Duration_Hr # adding Duration_Hr to the dataframe (df)
Sum_Time =df.groupby(['Unit','DelayDesc'])['Duration_Hr'].sum().to_frame('Time_Hrs')
print(Sum_Time)
with output:
What am I missing? What don't I know? please explain simply. I have only been using python for a couple months now and everything is quite confusing so hopefully this question makes sense. Thanks!
Not having your exact data to test with, I can't be positive, but it may be as simple as using the pandas function for df.nlargest()
.
If your Sum_Time
dataframe has the correct data but the last question is to just get the top 10 by `['Time_Hrs'], this should do it.
df_final = Sum_Time.nlargest(10,'Time_Hrs')
print(df_final)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.