Top 10 per unit in python?

Question

I have delay data for 105 units. I need the data to show the Top 10 largest delays per unit.

I need it to show 3 columns Unit, DelayDesc, and Time_hrs and each unit needs to show only the top 10 DelayDesc and the hours for those 10 largest delays

At the moment I can only get each unit separated by all its delays using this python code:

Duration_Hr = df['Duration_s']/3600 # calulation from seconds to hours

df['Duration_Hr'] = Duration_Hr  # adding Duration_Hr to the dataframe (df)

Sum_Time =df.groupby(['Unit','DelayDesc'])['Duration_Hr'].sum().to_frame('Time_Hrs')
print(Sum_Time)

with output:

output of code as image

What am I missing? What don't I know? please explain simply. I have only been using python for a couple months now and everything is quite confusing so hopefully this question makes sense. Thanks!

Answer 1

Not having your exact data to test with, I can't be positive, but it may be as simple as using the pandas function for df.nlargest() .

If your Sum_Time dataframe has the correct data but the last question is to just get the top 10 by `['Time_Hrs'], this should do it.

df_final = Sum_Time.nlargest(10,'Time_Hrs')
print(df_final)

Top 10 per unit in python?

Question

1 answers

solution1
0 2021-07-15 17:42:03

Top 10 per unit in python?

Question

1 answers

solution1 0 2021-07-15 17:42:03

solution1
0 2021-07-15 17:42:03