I am trying to create a new column in Dataframe DF based on condition
Here is my dataframe DF
+-------------------+-----------+
| DiffColumnName| Datatype|
+-------------------+-----------+
| DEST_COUNTRY_NAME| StringType|
|ORIGIN_COUNTRY_NAME| StringType|
| COUNT|IntegerType|
+-------------------+-----------+
and Array of String having column names( this is not constant and can be changed)
val diffcolarray = Array("ORIGIN_COUNTRY_NAME", "COUNT")
I want to create a new column in DF based on a condition that if columns present in diffcolarray is also present in Dataframe's column DiffColumnName then yes else no.
I have tried below options however getting error
val newdf = df.filter(when(col("DiffColumnName") === df.columns.filter(diffcolarray.contains(_)), "yes").otherwise("no")).as("issue")
val newdf = valdfe.filter(when(col("DiffColumnName") === df.columns.map(diffcolarray.contains(_)), "yes").otherwise("no")).as("issue")
Looks like when comparing there is datatype mismatch.Output should be something like this. Any suggestion would be helpful. Thank you
+-------------------+-----------+----------+
| DiffColumnName| Datatype| Issue |
+-------------------+-----------+----------+
| DEST_COUNTRY_NAME| StringType| NO |
|ORIGIN_COUNTRY_NAME| StringType| NO |
| COUNT|IntegerType| YES |
+-------------------+-----------+----------+
This can give you the desired output.
df.withColumn("Issue",when(col("DiffColumnName").isin(diffcolarray: _*),"YES").otherwise("NO")).show(false)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.