[英]pyspark conditions on multiple columns and returning new column
I am using spark 2.1 and scripting is pyspark. 我正在使用spark 2.1,脚本是pyspark。 Please help me with this as I am stuck up here . 因为我被困在这里,请帮助我。
Problem statement: To create new columns based on conditions on multiple columns 问题陈述:根据多个列上的条件创建新列
Input dataframe
is below 输入dataframe
如下
FLG1 FLG2 FLG3
T F T
F T T
T T F
Now I need to create one new column as FLG and my conditions would be like if FLG1==T&&(FLG2==F||FLG2==T)
my FLG
has to be T
else F
现在,我需要创建一个新列作为FLG,而我的情况就好比如果FLG1==T&&(FLG2==F||FLG2==T)
我的FLG
必须为T
否则F
Considered above dataframe
as DF
在以上dataframe
视为DF
below is my code snippet which was tried 以下是我尝试过的代码段
DF.withColumn("FLG",DF.select(when(FLG1=='T' and (FLG2=='F' or FLG2=='T','F').otherwise('T'))).show()
Didn't work I was getting name when is not defined 没有定义我的名字时没有工作
Please help me in crossing this hurdle 请帮助我克服这个障碍
Try the following, it should work 尝试以下方法,它应该可以工作
from pyspark.sql.functions import col, when, lit
DF.withColumn("FLG", when((col("FLG1")=='T') & ((col("FLG2")=='F') | (col("FLG2")=='T')),lit('F')).otherwise(lit('T'))).show()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.