PySpark Dataframe: Column based on existence and Value of another column

Question

What I am trying to do is set a Value "EXIST" based on a.isnotnull in potentially nonexisting column.

What I mean is: I have a dataframe A like

A     B     C
-------------------
1     a     "Test"  
2     b     null
3     c     "Test2"

Where C isnt necessarily defined. I want to define another Dataframe BB:

D     E       F
---------------
1     a      'J'
2     b      'N'
3     c      'J'

Where is the Column BF is either 'N' everywhere in case that A.C is not defined, or 'N' if A.Cs value is null and 'J' if the value is not null.

How would you proceed at this point?

I thought of using when statement

DF.withColumn('F'. when(A.C.isNotNull(), 'J').otherwise('N')) but how would you check for the existence of the Column in the same statement?

Answer 1

First you check if the column exists. If not, you create it.

from pyspark.sql import functions as F

if "c" not in df.columns:
    df = df.withColumn("c", F.lit(None))

then you create the column F:

df.withColumn('F'. F.when(F.col("C").isNotNull(), 'J').otherwise('N'))

Answer 2

You can check the column's present using 'c' in data_sdf.columns . Here's an example using it.

Let's say the input dataframe has 3 columns - ['a', 'b', 'c']

data_sdf. \
    withColumn('d', 
               func.when(func.col('c').isNull() if 'c' in data_sdf.columns else func.lit(True), func.lit('N')).
               otherwise(func.lit('J'))
               ). \
    show()

# +---+---+----+---+
# |  a|  b|   c|  d|
# +---+---+----+---+
# |  1|  2|   3|  J|
# |  1|  2|null|  N|
# +---+---+----+---+

Now, let's say there are only 2 columns - ['a', 'b']

# +---+---+---+
# |  a|  b|  d|
# +---+---+---+
# |  1|  2|  N|
# |  1|  2|  N|
# +---+---+---+

PySpark Dataframe: Column based on existence and Value of another column

Question

2 answers

solution1
1 2022-08-05 08:57:49

solution2
0 2022-08-05 08:56:28

PySpark Dataframe: Column based on existence and Value of another column

Question

2 answers

solution1 1 2022-08-05 08:57:49

solution2 0 2022-08-05 08:56:28

solution1
1 2022-08-05 08:57:49

solution2
0 2022-08-05 08:56:28