繁体   English   中英

将 Scala 代码转换为 PySpark

[英]Converting Scala code to PySpark

我找到了以下代码,用于从按 unique_id 分组的数据框中选择 n 行。

import org.apache.spark.sql.expressions.Window
import org.apache.spark.sql.functions.row_number

val window = Window.partitionBy("userId").orderBy($"rating".desc)

dataframe.withColumn("r", row_number.over(window)).where($"r" <= n)

我尝试了以下方法:

from pyspark.sql.functions import row_number, desc
from pyspark.sql.window import Window

w = Window.partitionBy(post_tags.EntityID).orderBy(post_tags.Weight)
newdata=post_tags.withColumn("r", row_number.over(w)).where("r" <= 3)

我收到以下错误:

AttributeError: 'function' object has no attribute 'over'

请帮助我。

我找到了这个问题的答案:

from pyspark.sql.window import Window
from pyspark.sql.functions import rank, col

window = Window.partitionBy(df['user_id']).orderBy(df['score'].desc())

df.select('*', rank().over(window).alias('rank')) 
  .filter(col('rank') <= 2) 
  .show() 

感谢@mtoto 的回答https://stackoverflow.com/a/38398563/5165377

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM