Pyspark TypeError: 'NoneType' object is not callable when applying a UDF on dataframe column

Question

I have created a dataframe with the below schema, I'm trying to extract the first 10 values in "contents.monid" of each row for which I created an UDF 'udfTop'.

>>> df.printSchema()
 |-- userid: long (nullable = true)
 |-- contents: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- monid: struct (nullable = true)
 |    |    |    |-- mon: string (nullable = true)
 |    |    |    |-- id: long (nullable = true)
 |    |    |-- count: integer (nullable = true)

>>> def take(n,data):
...     if data is null:
...             return null
...     else:
...             return data.take(n)

>>> udfTop = spark.udf.register("top_n", take)

But when I apply the udfTop on "contents" column's "monid" which is of struct type it gives me the error TypeError: 'NoneType' object is not callable although I've taken care of null values in UDF definition, also there are actually no null values in that column.

>>> new_df = df.withColumn("mon_ids", udfTop(10, "contents.monid"))
 Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
 TypeError: 'NoneType' object is not callable

I was able to follow similar approach and got no errors in Spark-shell using Scala, but I want this to work using PySpark.

For a row in df with the 'contents' column value as:

[[Art,1111],100],[[Art,1112],110],[[Art,1113],120],[[Art,1114],130].....(100 such values)

After applying UDF, that row should give the value of 'mon_ids' column in new_df as:

[Art,1111],[Art,1112],[Art,1113],[Art,1114]....(10 values)

Answer 1

Issue was observed to be with my spark.udf.register syntax, modifying it to the below syntax and also changing take.data(n) to data[:n] has resolved the issue:

udfTop=udf(take,ArrayType(IntegerType()))

Pyspark TypeError: 'NoneType' object is not callable when applying a UDF on dataframe column

Question

1 answers

solution1
0 2019-12-16 17:49:39

Pyspark TypeError: 'NoneType' object is not callable when applying a UDF on dataframe column

Question

1 answers

solution1 0 2019-12-16 17:49:39

solution1
0 2019-12-16 17:49:39