PySpark - Pull the row and all columns that contains the max value of specific column

Question

I have a spark dataframe that looks like this

df =

   Name  Score Section
     W     26       A
     M     62       A
     Q     69       A
     Y     86       A
     J     16       B
     A     83       B

I want create a new dataframe that contains a single row (the row with the max score) so it will look like this

dataframe_maximum =

     Name  Score Section
      Y     86       A

I know I can use groupby and agg max to achieve this I tried something like this but I don't think I quite have it correct

 dataframe_max = df.groupBy(['Name','Score','Section']).agg(
     max('Score')

Answer 1

df.sort("Score",ascending=False).take(1) 虽然，排序是一个广泛的操作，所以它可能效率不高

PySpark - Pull the row and all columns that contains the max value of specific column

Question

1 answers

solution1
1 ACCPTED 2021-11-02 03:54:56

PySpark - Pull the row and all columns that contains the max value of specific column

Question

1 answers

solution1 1 ACCPTED 2021-11-02 03:54:56

solution1
1 ACCPTED 2021-11-02 03:54:56