简体   繁体   中英

Error while using partition by clause in pyspark

I need to use partition by clause in two columns and found the rownumber. Also, I need to extract only the row which has rownumber= 1 .

I have df3 dataframe which holds these data:

在此处输入图像描述

I am trying to use partition by clause using two columns "category_name,SubCategoryName" and ordering them by totalsales descending:

from pyspark.sql.window import Window
from pyspark.sql.functions import row_number
    windowSpec  = Window.partitionBy("category_name,SubCategoryName").orderBy("total_sales_360 desc")
    
    df3.withColumn("row_number",row_number().over(windowSpec)).show(truncate=False)

I am getting error while trying to see the df3 after using partition by.

change .partitionBy("category_name,SubCategoryName") to .partitionBy("category_name", "SubCategoryName")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM