[英]How to convert SQL Filter to Pyspark
我有以下 SQL:
freecourse_info_step_8 as (
-- How many questions answered correct in that
select *,
count(question_number) FILTER (WHERE answered = true) over(partition by hacker_rank_id, freecourse_version, question_block, freecourse_users_id) as answered_correct_in_block
from freecourse_info_step_7
),
我轉換為 Pyspark 為
column_list = ["hacker_rank_id", "freecourse_version", "question_block", "freecourse_users_id"]
window = Window.partitionBy([f.col(x) for x in column_list])
freecourse_info_step_8 = freecourse_info_step_7.withColumn('answered_correct_in_block',
f.when(f.col('answered') == True, f.count('question_number').over(window)))
我懷疑該代碼與 SQL 的行為不同。 我對嗎? 如何正確將此 SQL 轉換為 PySpark?
Pyspark spark.sql() 方法不適用於 FILTER
freecourse_info_step_8 = freecourse_info_step_7.withColumn('answered_correct_in_block',
f.count(f.when(f.col('answered') == True, 'question_number')).over(window))
計數 function 應該在條件之外
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.