简体   繁体   English

Spark DataFrame添加值列

[英]Spark DataFrame Add Column with Value

I have a DataFrame with below data 我有一个下面的数据的DataFrame

scala> nonFinalExpDF.show
+---+----------+
| ID|      DATE|
+---+----------+
|  1|      null|
|  2|2016-10-25|
|  2|2016-10-26|
|  2|2016-09-28|
|  3|2016-11-10|
|  3|2016-10-12|
+---+----------+

From this DataFrame I want to get below DataFrame 从这个DataFrame我想得到下面的DataFrame

+---+----------+----------+
| ID|      DATE| INDICATOR|
+---+----------+----------+
|  1|      null|         1|
|  2|2016-10-25|         0|
|  2|2016-10-26|         1|
|  2|2016-09-28|         0|
|  3|2016-11-10|         1|
|  3|2016-10-12|         0|
+---+----------+----------+

Logic - 逻辑-

  1. For latest DATE(MAX Date) of an ID, Indicator value would be 1 and others are 0. 对于ID的最新DATE(最大日期),指标值将为1,其他值为0。
  2. For null value of the account Indicator would be 1 对于帐户的空值,指标将为1

Please suggest me a simple logic to do that. 请为我建议一个简单的逻辑。

Try 尝试

df.createOrReplaceTempView("df")
spark.sql("""
  SELECT id, date,
    CAST(LEAD(COALESCE(date, TO_DATE('1900-01-01')), 1)
    OVER (PARTITION BY id ORDER BY date) IS NULL AS INT)
  FROM df""")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM