简体   繁体   English

多个列上的pyspark条件并返回新列

[英]pyspark conditions on multiple columns and returning new column

I am using spark 2.1 and scripting is pyspark. 我正在使用spark 2.1,脚本是pyspark。 Please help me with this as I am stuck up here . 因为我被困在这里,请帮助我。

Problem statement: To create new columns based on conditions on multiple columns 问题陈述:根据多个列上的条件创建新列

Input dataframe is below 输入dataframe如下

FLG1 FLG2 FLG3

T     F     T

F     T     T

T     T     F

Now I need to create one new column as FLG and my conditions would be like if FLG1==T&&(FLG2==F||FLG2==T) my FLG has to be T else F 现在,我需要创建一个新列作为FLG,而我的情况就好比如果FLG1==T&&(FLG2==F||FLG2==T)我的FLG必须为T否则F

Considered above dataframe as DF 在以上dataframe视为DF

below is my code snippet which was tried 以下是我尝试过的代码段

DF.withColumn("FLG",DF.select(when(FLG1=='T' and (FLG2=='F' or FLG2=='T','F').otherwise('T'))).show()

Didn't work I was getting name when is not defined 没有定义我的名字时没有工作

Please help me in crossing this hurdle 请帮助我克服这个障碍

Try the following, it should work 尝试以下方法,它应该可以工作

from pyspark.sql.functions import col, when, lit
DF.withColumn("FLG", when((col("FLG1")=='T') & ((col("FLG2")=='F') | (col("FLG2")=='T')),lit('F')).otherwise(lit('T'))).show()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM