[英]How to update Spark DataFrame Column Values of a table from another table based on a condition using Pyspark
[英]How to update rows in spark dataframe based on condition
我正在嘗試更新 dataframe 的一些行,下面是我的代碼。
dfs_ids1 = dfs_ids1.withColumn("arrival_dt", F.when(F.col("arrival_dt")=='1960-01-01', lit(None)) )
基本上,我想用null更新arrival_dt為1960-01-01的所有行,並保留 rest 行不變。
您需要了解filter
及其功能when
如果你只想獲取行而不關心其他行,試試這個。
from pyspark.sql.functions import *
dfs_ids1 = dfs_ids1.filter(col("arrival_dt='1960-01-01'"))
如果您想使用自定義值或其他列更新剩余。
dfs_ids1=dfs_ids1.withColumn("arrival_dt",when(col("arrival_dt")=="1960-01-01",col("arrival_dt")).otherwise(lit(None)))
//Or
dfs_ids1=dfs_ids1.withColumn("arrival_dt",when(col("arrival_dt")=="1960-01-01",col("arrival_dt")))
//Sample example
//Input df
+------+-------+-----+
| name| city|state|
+------+-------+-----+
| manoj|gwalior| mp|
| kumar| delhi|delhi|
|dhakad|chennai| tn|
+------+-------+-----+
from pyspark.sql.functions import *
opOneDf=df.withColumn("name",when(col("city")=="delhi",col("city")).otherwise(lit(None)))
opOneDf.show()
//Sample output
+-----+-------+-----+
| name| city|state|
+-----+-------+-----+
| null|gwalior| mp|
|delhi| delhi|delhi|
| null|chennai| tn|
+-----+-------+-----+
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.