简体   繁体   中英

PySpark - Pull the row and all columns that contains the max value of specific column

I have a spark dataframe that looks like this

df =

   Name  Score Section
     W     26       A
     M     62       A
     Q     69       A
     Y     86       A
     J     16       B
     A     83       B

I want create a new dataframe that contains a single row (the row with the max score) so it will look like this

dataframe_maximum =

     Name  Score Section
      Y     86       A

I know I can use groupby and agg max to achieve this I tried something like this but I don't think I quite have it correct

 dataframe_max = df.groupBy(['Name','Score','Section']).agg(
     max('Score')

df.sort("Score",ascending=False).take(1) 虽然,排序是一个广泛的操作,所以它可能效率不高

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM