When I use UDF to operate a columns, It have a problem

Question

When I used UDF to process a Column, I am not sure is the UDF process the element one by one from this column? If so, I cannot understand why there is a problem.

import pyspark.sql.types as typ
from pyspark.sql.functions import udf,pandas_udf, PandasUDFType
def parse_model(v):
    return v.split(' ')
Parse_model=pandas_udf(parse_model,typ.ArrayType(typ.StringType(),True))
sample_data_df.withColumn('Models',Parse_model('Model')).show(

It should be string in the column not the series.

AttributeError: 'Series' object has no attribute 'split'

Answer 1

Scalar Pandas user-defined function takes pandas.Series and returns the result as a pandas.Series .

As v is of type Series, you are getting the error. updating the udf like below will fix the issue.

def parse_model(v): 
   return pd.Series([v[0].split(' ')])

When I use UDF to operate a columns, It have a problem

Question

1 answers

solution1
0 ACCPTED 2019-05-20 08:56:38

When I use UDF to operate a columns, It have a problem

Question

1 answers

solution1 0 ACCPTED 2019-05-20 08:56:38

solution1
0 ACCPTED 2019-05-20 08:56:38