When applying a Pyspark UDF that calls an AWS API, I get the error
PicklingError: Could not serialize object: TypeError: can't pickle SSLContext objects
The code is
import pyspark.sql.functions as sqlf
import boto3
comprehend = boto3.client('comprehend', region_name='us-east-1')
def detect_sentiment(text):
response = comprehend.detect_sentiment(Text=text, LanguageCode='pt')
return response["SentimentScore"]["Positive"]
detect_sentiment_udf = sqlf.udf(detect_sentiment)
test = df.withColumn("Positive", detect_sentiment_udf(df.Conversa))
Where df.Conversa
contains short simple strings. Please, how can I solve this? Or what could be an alternative approach?
Add the comprehend boto3 client into the detect_sentiment function definition.
When your udf is called, it receives the entire context, and this context needs to be serializable. The boto client is NOT serializable, so you need to create it within your udf call.
If you are using an object's method as udf, such as below, you will get the same error. To fix it, add a property for the client.
class Foo:
def __init__(self):
# this will generate an error when udf is called
self.client = boto3.client('comprehend', region_name='us-east-1')
# do this instead
@property
def client(self):
return boto3.client('comprehend', region_name='us-east-1')
def my_udf(self, text):
response = self.client.detect_sentiment(Text=text, LanguageCode='pt')
return response["SentimentScore"]["Positive"]
def add_sentiment_column(self, df):
detect_sentiment_udf = sqlf.udf(self.my_udf)
return df.withColumn("Positive", detect_sentiment_udf(df.Conversa))
@johnhill2424's answer will fix the problem in your case:
import pyspark.sql.functions as sqlf
import boto3
def detect_sentiment(text):
comprehend = boto3.client('comprehend', region_name='us-east-1')
response = comprehend.detect_sentiment(Text=text, LanguageCode='pt')
return response["SentimentScore"]["Positive"]
detect_sentiment_udf = sqlf.udf(detect_sentiment)
test = df.withColumn("Positive", detect_sentiment_udf(df.Conversa))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.