[英]Add new column in Pyspark dataframe based on where condition on other column
[英]pyspark new column with select where
我需要為我的 dataframe 創建 2 個額外的列,這些列基於涉及一列與自身除以另一列的條件的計算。
我有一個 SQL 轉換的工作示例,但需要在 pyspark 等效中重寫它並且無法正確處理。 到目前為止我所擁有的:
%python
data = [("AUD", 7.1), ("EUR", 11.2), ("USD", 9.1)]
cols = ["Currency", "RateSEK"]
df = spark.createDataFrame(data, cols)
df.show()
+--------+-------+
|Currency|RateSEK|
+--------+-------+
| AUD| 7.1|
| EUR| 11.2|
| USD| 9.1|
+--------+-------+
df.createOrReplaceTempView("tempdf")
以上是我目前在 pyspark 中的內容。 下面是我想用 pyspark 實現的 SQL 代碼:
%sql
SELECT
*,
RateSEK / (SELECT RateSEK FROM tempdf WHERE Currency = 'EUR') AS RateEur,
RateSEK / (SELECT RateSEK FROM tempdf WHERE Currency = 'USD') AS RateUSD
FROM
tempdf
您可以使用.head()
獲得子查詢的結果:
import pyspark.sql.functions as F
df2 = df.withColumn(
'RateEur',
F.col('RateSEK') / df.filter("Currency = 'EUR'").head()['RateSEK']
).withColumn(
'RateUSD',
F.col('RateSEK') / df.filter("Currency = 'USD'").head()['RateSEK']
)
df2.show()
+--------+-------+------------------+------------------+
|Currency|RateSEK| RateEur| RateUSD|
+--------+-------+------------------+------------------+
| AUD| 7.1|0.6339285714285714|0.7802197802197802|
| EUR| 11.2| 1.0|1.2307692307692308|
| USD| 9.1| 0.8125| 1.0|
+--------+-------+------------------+------------------+
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.