简体   繁体   English

如何根据条件更新 spark dataframe 中的行

[英]How to update rows in spark dataframe based on condition

I am trying to update some rows of dataframe,below is my code.我正在尝试更新 dataframe 的一些行,下面是我的代码。

dfs_ids1 = dfs_ids1.withColumn("arrival_dt", F.when(F.col("arrival_dt")=='1960-01-01', lit(None)) )

Basically, I want to update all the rows where arrival_dt is 1960-01-01 with null and leave rest of the rows unchanged .基本上,我想用null更新arrival_dt1960-01-01的所有行,并保留 rest 行不变

You need to understand the filter and when functions.您需要了解filter及其功能when

If you want to fetch rows only without caring about others, try this.如果你只想获取行而不关心其他行,试试这个。

from pyspark.sql.functions import *

dfs_ids1 = dfs_ids1.filter(col("arrival_dt='1960-01-01'"))

If you want to update remaining with custom value or other columns.如果您想使用自定义值或其他列更新剩余。

dfs_ids1=dfs_ids1.withColumn("arrival_dt",when(col("arrival_dt")=="1960-01-01",col("arrival_dt")).otherwise(lit(None))) 

//Or

dfs_ids1=dfs_ids1.withColumn("arrival_dt",when(col("arrival_dt")=="1960-01-01",col("arrival_dt")))

//Sample example

//Input df

+------+-------+-----+
|  name|   city|state|
+------+-------+-----+
| manoj|gwalior|   mp|
| kumar|  delhi|delhi|
|dhakad|chennai|   tn|
+------+-------+-----+

from pyspark.sql.functions import *
opOneDf=df.withColumn("name",when(col("city")=="delhi",col("city")).otherwise(lit(None)))
opOneDf.show()

//Sample output

+-----+-------+-----+
| name|   city|state|
+-----+-------+-----+
| null|gwalior|   mp|
|delhi|  delhi|delhi|
| null|chennai|   tn|
+-----+-------+-----+

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM