Here is the pseudocode:
def is_XYZ(sub1,sub2):
if sub1==lit("XYZ") | sub2==lit("XYZ"):
return 1
else :
return 0
xyz=F.udf( lambda sub1,sub2:is_XYZ(sub1,sub2),BooleanType())
---and I'm trying ot create a dataframe as below:
df=df.withColumn("is_XYZ",xyz(col("sub1"),col("sub2"))).show()
ERROR:
AttributeError: 'NoneType' object has no attribute '_jvm'
at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.handlePythonException(PythonRunner.scala:514)
at org.apache.spark.sql.execution.python.PythonUDFRunner$$anon$1.read(PythonUDFRunner.scala:81)
at org.apache.spark.sql.execution.python.PythonUDFRunner$$anon$1.read(PythonUDFRunner.scala:64)
at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:468) ......
lit is part of the pyspark api, and isn't required inside a python udf. Regular python code is enough, ie
def is_XYZ(sub1,sub2):
if sub1==lit("XYZ") | sub2==lit("XYZ"):
return 1
else :
return 0
can be replaced with
def is_XYZ(sub1,sub2):
if((sub1=="XYZ") | (sub2=="XYZ")):
return 1
else :
return 0
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.