why do I have attribute error while trying to display dataframe which has column created by UDF function in it?

Question

Here is the pseudocode:

def is_XYZ(sub1,sub2):
  if sub1==lit("XYZ") | sub2==lit("XYZ"):
    return 1
  else :
    return 0


xyz=F.udf( lambda sub1,sub2:is_XYZ(sub1,sub2),BooleanType())

---and I'm trying ot create a dataframe as below:

df=df.withColumn("is_XYZ",xyz(col("sub1"),col("sub2"))).show()

ERROR:

AttributeError: 'NoneType' object has no attribute '_jvm'

    at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.handlePythonException(PythonRunner.scala:514)
    at org.apache.spark.sql.execution.python.PythonUDFRunner$$anon$1.read(PythonUDFRunner.scala:81)
    at org.apache.spark.sql.execution.python.PythonUDFRunner$$anon$1.read(PythonUDFRunner.scala:64)
    at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:468) ......

Answer 1

lit is part of the pyspark api, and isn't required inside a python udf. Regular python code is enough, ie

def is_XYZ(sub1,sub2):
  if sub1==lit("XYZ") | sub2==lit("XYZ"):
    return 1
  else :
    return 0

can be replaced with

def is_XYZ(sub1,sub2):
  if((sub1=="XYZ") | (sub2=="XYZ")):
    return 1
  else :
    return 0

why do I have attribute error while trying to display dataframe which has column created by UDF function in it?

Question

1 answers

solution1
0 2020-06-17 13:14:29

why do I have attribute error while trying to display dataframe which has column created by UDF function in it?

Question

1 answers

solution1 0 2020-06-17 13:14:29

solution1
0 2020-06-17 13:14:29