I want to add a column to a data frame and depending on whether a certain value appears in the source json, the value of the column should be the value from the source or null. My code looks like this:
withColumn("STATUS_BIT", expr("case when 'statusBit:' in jsonDF.schema.simpleString() then statusBit else None end"))
When I run this, I am getting "mismatched input ''statusBit:'' expecting {< EOF >, '-'} . Am I doing something wrong with the quotation marks? When I try
withColumn("STATUS_BIT", expr("case when \'statusBit:\' in jsonDF.schema.simpleString() then statusBit else None end"))
I get the exact same error. Trying the whole thing without expr but as a simple when, triggers the error "condition should be a Column". Running 'statusBit:' in jsonDF.schema.simpleString() by itself returns True with the testdata I am using, but somehow I cant integrate it into the dataframe transformation.Thanks a lot for your help in advance.
jsonDF.schema.simpleString()
is Python variable, you can use it in Python way
from pyspark.sql import functions as F
df.withColumn("STATUS_BIT", F.lit(df.schema.simpleString()).contains('statusBit:'))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.