简体   繁体   中英

Pyspark: mismatched input ... expecting EOF

I want to add a column to a data frame and depending on whether a certain value appears in the source json, the value of the column should be the value from the source or null. My code looks like this:

withColumn("STATUS_BIT", expr("case when 'statusBit:' in jsonDF.schema.simpleString() then statusBit else None end"))

When I run this, I am getting "mismatched input ''statusBit:'' expecting {< EOF >, '-'} . Am I doing something wrong with the quotation marks? When I try

withColumn("STATUS_BIT", expr("case when \'statusBit:\' in jsonDF.schema.simpleString() then statusBit else None end"))

I get the exact same error. Trying the whole thing without expr but as a simple when, triggers the error "condition should be a Column". Running 'statusBit:' in jsonDF.schema.simpleString() by itself returns True with the testdata I am using, but somehow I cant integrate it into the dataframe transformation.Thanks a lot for your help in advance.

jsonDF.schema.simpleString() is Python variable, you can use it in Python way

from pyspark.sql import functions as F

df.withColumn("STATUS_BIT", F.lit(df.schema.simpleString()).contains('statusBit:'))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM