简体   繁体   中英

Top 10 per unit in python?

I have delay data for 105 units. I need the data to show the Top 10 largest delays per unit.

I need it to show 3 columns Unit, DelayDesc, and Time_hrs and each unit needs to show only the top 10 DelayDesc and the hours for those 10 largest delays

At the moment I can only get each unit separated by all its delays using this python code:

Duration_Hr = df['Duration_s']/3600 # calulation from seconds to hours

df['Duration_Hr'] = Duration_Hr  # adding Duration_Hr to the dataframe (df)

Sum_Time =df.groupby(['Unit','DelayDesc'])['Duration_Hr'].sum().to_frame('Time_Hrs')
print(Sum_Time)

with output:

output of code as image

What am I missing? What don't I know? please explain simply. I have only been using python for a couple months now and everything is quite confusing so hopefully this question makes sense. Thanks!

Not having your exact data to test with, I can't be positive, but it may be as simple as using the pandas function for df.nlargest() .

If your Sum_Time dataframe has the correct data but the last question is to just get the top 10 by `['Time_Hrs'], this should do it.

df_final = Sum_Time.nlargest(10,'Time_Hrs')
print(df_final)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM