簡體   English   中英

如何將 SQL 過濾器轉換為 Pyspark

[英]How to convert SQL Filter to Pyspark

我有以下 SQL:

freecourse_info_step_8 as (
-- How many questions answered correct in that
select *, 
    count(question_number) FILTER (WHERE answered = true) over(partition by hacker_rank_id, freecourse_version, question_block, freecourse_users_id) as answered_correct_in_block
from freecourse_info_step_7
),

我轉換為 Pyspark 為

column_list = ["hacker_rank_id", "freecourse_version", "question_block", "freecourse_users_id"]
window = Window.partitionBy([f.col(x) for x in column_list])
freecourse_info_step_8 = freecourse_info_step_7.withColumn('answered_correct_in_block',
                                                           f.when(f.col('answered') == True, f.count('question_number').over(window)))

我懷疑該代碼與 SQL 的行為不同。 我對嗎? 如何正確將此 SQL 轉換為 PySpark?

Pyspark spark.sql() 方法不適用於 FILTER

freecourse_info_step_8 = freecourse_info_step_7.withColumn('answered_correct_in_block',
                                                          f.count(f.when(f.col('answered') == True, 'question_number')).over(window))

計數 function 應該在條件之外

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM