简体   繁体   中英

Create a new column based on Condition in Spark Dataframe

I am trying to create a new column in Dataframe DF based on condition

Here is my dataframe DF

+-------------------+-----------+
|     DiffColumnName|   Datatype|
+-------------------+-----------+
|  DEST_COUNTRY_NAME| StringType|
|ORIGIN_COUNTRY_NAME| StringType|
|              COUNT|IntegerType|
+-------------------+-----------+

and Array of String having column names( this is not constant and can be changed)

val diffcolarray = Array("ORIGIN_COUNTRY_NAME", "COUNT")

I want to create a new column in DF based on a condition that if columns present in diffcolarray is also present in Dataframe's column DiffColumnName then yes else no.

I have tried below options however getting error

val newdf = df.filter(when(col("DiffColumnName") === df.columns.filter(diffcolarray.contains(_)), "yes").otherwise("no")).as("issue")

val newdf = valdfe.filter(when(col("DiffColumnName") === df.columns.map(diffcolarray.contains(_)), "yes").otherwise("no")).as("issue")

Looks like when comparing there is datatype mismatch.Output should be something like this. Any suggestion would be helpful. Thank you

+-------------------+-----------+----------+
|     DiffColumnName|   Datatype|   Issue  |
+-------------------+-----------+----------+
|  DEST_COUNTRY_NAME| StringType|   NO     |
|ORIGIN_COUNTRY_NAME| StringType|   NO     |
|              COUNT|IntegerType|   YES    |
+-------------------+-----------+----------+

This can give you the desired output.

df.withColumn("Issue",when(col("DiffColumnName").isin(diffcolarray: _*),"YES").otherwise("NO")).show(false)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM