AWS Comprehend 和 PySpark - F.when 不工作

Question

I have rows of sentences in different languages and the language code is in a separate column.我有一排不同语言的句子，语言代码在单独的列中。 I am specifying to only process certain languages (en, es, fr, or de) since I know AWS Comprehend does not support 'nl' (Dutch).我指定只处理某些语言（en、es、fr 或 de），因为我知道 AWS Comprehend 不支持“nl”（荷兰语）。 For some reason I continue to get an error that 'nl' is not supported even though it is not listed in my when condition and should therefore not be getting sent through the Comprehend udf.出于某种原因，我继续收到不支持“nl”的错误，即使它没有在我的 when 条件中列出，因此不应该通过 Comprehend udf 发送。 Any ideas on what might be wrong?关于什么可能是错的任何想法？

Here is my code:这是我的代码：

import pyspark.sql.functions as F

def detect_sentiment(text,language):
    comprehend = boto3.client(service_name='comprehend', region_name='us-west-2')
    sentiment_analysis = comprehend.detect_sentiment(Text=text, LanguageCode=language)
    return sentiment_analysis


detect_sentiment_udf = F.udf(detect_sentiment)

reviews_4 = reviews_3.withColumn('RAW_SENTIMENT_SCORE', \
        F.when( (F.col('LANGUAGE')=='en') | (F.col('LANGUAGE')=='es') | (F.col('LANGUAGE')=='fr') | (F.col('LANGUAGE')=='de') , \
               detect_sentiment_udf('SENTENCE', 'LANGUAGE')).otherwise(None) )

reviews_4.show(50)

I get this error:我收到此错误：

botocore.exceptions.ClientError: An error occurred (ValidationException) when calling the DetectSentiment operation: 
Value 'nl' at 'languageCode'failed to satisfy constraint: Member must satisfy enum value set: [ar, hi, ko, zh-TW, ja, zh, de, pt, en, it, fr, es]

Answer 1

Still unsure why my code above wouldn't work but I found the following work around to be effective.仍然不确定为什么我上面的代码不起作用，但我发现以下解决方法是有效的。

def detect_sentiment(text,language):
    comprehend = boto3.client(service_name='comprehend', region_name='us-west-2')
    if (language == 'en') | (language == 'es') | (language == 'fr') | (language == 'de') :
        sentiment_analysis = comprehend.detect_sentiment(Text=text, LanguageCode=language)
        return sentiment_analysis
    else:
        return None

detect_sentiment_udf = F.udf(detect_sentiment)

reviews_4 = reviews_3.withColumn('RAW_SENTIMENT_SCORE', detect_sentiment_udf('SENTENCE','LANGUAGE'))

reviews_4.show(50)

AWS Comprehend 和 PySpark - F.when 不工作

问题描述

1 个解决方案

解决方案1
0 2022-01-13 22:03:14

AWS Comprehend 和 PySpark - F.when 不工作

问题描述

1 个解决方案

解决方案1 0 2022-01-13 22:03:14

解决方案1
0 2022-01-13 22:03:14