简体繁体中英

Error while using partition by clause in pyspark

原文 2022-08-13 04:06:31 5 1 python/ pandas/ apache-spark/ pyspark

I need to use partition by clause in two columns and found the rownumber. Also, I need to extract only the row which has rownumber= 1 .

I have df3 dataframe which holds these data:

I am trying to use partition by clause using two columns "category_name,SubCategoryName" and ordering them by totalsales descending:

from pyspark.sql.window import Window
from pyspark.sql.functions import row_number
    windowSpec  = Window.partitionBy("category_name,SubCategoryName").orderBy("total_sales_360 desc")
    
    df3.withColumn("row_number",row_number().over(windowSpec)).show(truncate=False)

I am getting error while trying to see the df3 after using partition by.

1 answers

change .partitionBy("category_name,SubCategoryName") to .partitionBy("category_name", "SubCategoryName")

Error In Pyspark while using collect and take methods

Error while using Scala object in PySpark

__getnewargs__ error while using udf in Pyspark

How to know which partition is currently running while using foreachPartition() function in pyspark?

error in or clause mongodb while using in pycharm

Overwrite mode in loop when partition using pyspark

Error while using pyspark.sql.function on Jupyter notebook

Getting Authentication error while accessing Azure Blob tables using Pyspark

Error while fetching data from cassandra using pyspark

getting error while using isin with case when statement in pyspark

暂无

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Error In Pyspark while using collect and take methods Error while using Scala object in PySpark __getnewargs__ error while using udf in Pyspark How to know which partition is currently running while using foreachPartition() function in pyspark? error in or clause mongodb while using in pycharm Overwrite mode in loop when partition using pyspark Error while using pyspark.sql.function on Jupyter notebook Getting Authentication error while accessing Azure Blob tables using Pyspark Error while fetching data from cassandra using pyspark getting error while using isin with case when statement in pyspark

Related Tags

粤ICP备18138465号 © 2020-2024 STACKOOM.COM