不确定如何解决来自 Google 自然语言 api 的语言错误消息：“document_sentiment 分析不支持语言 sq。”

Question

I have an app that's been working for months and is now giving me an error.我有一个已经运行了几个月的应用程序，现在给我一个错误。

The app takes tweets from the Twitter API and runs them through Google's Sentiment Analysis API, returning sentiment analysis on each of the tweets.该应用程序从 Twitter API 获取推文，并通过谷歌的情绪分析 API 运行它们，返回对每条推文的情绪分析。

Without changing the code, I'm suddenly getting a error that hasn't happened before.在不更改代码的情况下，我突然收到一个以前没有发生过的错误。

Error message错误信息

_InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
    status = StatusCode.INVALID_ARGUMENT
    details = "The language sq is not supported for document_sentiment analysis."
    debug_error_string = "UNKNOWN:Error received from peer ipv4:74.125.69.95:443 {grpc_message:"The language sq is not supported for document_sentiment analysis.", grpc_status:3, created_time:"2022-12-24T04:26:31.031735656+00:00"}"
>

Interpretation解释

Even though I'm stating only 'english' language tweets in my Twitter API query ( -is:retweet lang:en ), my understanding of the error messsage is that the NL API is thinking this is some language referred to as sq .尽管我在我的 Twitter API 查询 ( -is:retweet lang:en ) 中仅声明了“英语”语言推文，但我对错误消息的理解是 NL API 认为这是某种称为sq的语言。 My research says that's 'Albanian'.我的研究表明那是“阿尔巴尼亚语”。

So my assumption is that the NL API is interpreting some block(s) of text in the tweets as being in Albanian, or maybe it's just a portion of an otherwise english tweet that has some Albanian language in it.所以我的假设是 NL API 将推文中的某些文本块解释为阿尔巴尼亚语，或者它可能只是包含阿尔巴尼亚语的其他英语推文的一部分。

Solution解决方案

Is there a way to ignore or skip a text if the API can't process the language the text is in?如果 API 无法处理文本所使用的语言，是否可以忽略或跳过文本？

This is the language_v1 call:这是language_v1调用：

def get_single_sentiment(text):
    '''gets non-entity sentiment of text using GCP's api'''
    
    # Instantiates a client
    client = language_v1.LanguageServiceClient()
    
    # The text to analyze 
    document = language_v1.Document(content = text , type_=language_v1.types.Document.Type.PLAIN_TEXT)

    # Detects the sentiment of the text
    sentiment = client.analyze_sentiment(request={"document": document}).document_sentiment

    return sentiment

Below is the full error message being returned when trying to run the sentiment analysis:以下是尝试运行情绪分析时返回的完整错误消息：

---------------------------------------------------------------------------
_InactiveRpcError                         Traceback (most recent call last)
/opt/conda/lib/python3.7/site-packages/google/api_core/grpc_helpers.py in error_remapped_callable(*args, **kwargs)
     56         try:
---> 57             return callable_(*args, **kwargs)
     58         except grpc.RpcError as exc:

/opt/conda/lib/python3.7/site-packages/grpc/_channel.py in __call__(self, request, timeout, metadata, credentials, wait_for_ready, compression)
    945                                       wait_for_ready, compression)
--> 946         return _end_unary_response_blocking(state, call, False, None)
    947 

/opt/conda/lib/python3.7/site-packages/grpc/_channel.py in _end_unary_response_blocking(state, call, with_call, deadline)
    848     else:
--> 849         raise _InactiveRpcError(state)
    850 

_InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
    status = StatusCode.INVALID_ARGUMENT
    details = "The language sq is not supported for document_sentiment analysis."
    debug_error_string = "UNKNOWN:Error received from peer ipv4:74.125.69.95:443 {grpc_message:"The language sq is not supported for document_sentiment analysis.", grpc_status:3, created_time:"2022-12-24T04:26:31.031735656+00:00"}"
>

The above exception was the direct cause of the following exception:

InvalidArgument                           Traceback (most recent call last)
/tmp/ipykernel_1/1103340548.py in <module>
      1 twitter_stage(QUERY_TW, N_HOURS_AGO
----> 2               , TWITTER_BQ_TABLE, ENTITY)

/tmp/ipykernel_1/2800777156.py in twitter_stage(QUERY, N_HOURS_AGO, TWITTER_BQ_TABLE, ENTITY)
     39 
     40         # get sentiment analysis
---> 41         twitapi_df = get_column_sentiment(twitapi_df, text_col='text', entity=ENTITY, query=QUERY)
     42 
     43         # Dropping columns that can't be saved to big query because they are not compatible

/tmp/ipykernel_1/2183820933.py in get_column_sentiment(df, text_col, entity, query)
    110 
    111     # for each entry in text_col, get a single sentiment result
--> 112     sentiment_column = df[text_col].apply(f)
    113 
    114     # for each entry in sentiment_column, fix null values (replace nulls will two values)

/opt/conda/lib/python3.7/site-packages/pandas/core/series.py in apply(self, func, convert_dtype, args, **kwargs)
   4355         dtype: float64
   4356         """
-> 4357         return SeriesApply(self, func, convert_dtype, args, kwargs).apply()
   4358 
   4359     def _reduce(

/opt/conda/lib/python3.7/site-packages/pandas/core/apply.py in apply(self)
   1041             return self.apply_str()
   1042 
-> 1043         return self.apply_standard()
   1044 
   1045     def agg(self):

/opt/conda/lib/python3.7/site-packages/pandas/core/apply.py in apply_standard(self)
   1099                     values,
   1100                     f,  # type: ignore[arg-type]
-> 1101                     convert=self.convert_dtype,
   1102                 )
   1103 

/opt/conda/lib/python3.7/site-packages/pandas/_libs/lib.pyx in pandas._libs.lib.map_infer()

/tmp/ipykernel_1/2183820933.py in get_single_sentiment(text)
     16 
     17     # Detects the sentiment of the text
---> 18     sentiment = client.analyze_sentiment(request={"document": document}).document_sentiment
     19 
     20     return sentiment

/opt/conda/lib/python3.7/site-packages/google/cloud/language_v1/services/language_service/client.py in analyze_sentiment(self, request, document, encoding_type, retry, timeout, metadata)
    509             retry=retry,
    510             timeout=timeout,
--> 511             metadata=metadata,
    512         )
    513 

/opt/conda/lib/python3.7/site-packages/google/api_core/gapic_v1/method.py in __call__(self, timeout, retry, *args, **kwargs)
    152             kwargs["metadata"] = metadata
    153 
--> 154         return wrapped_func(*args, **kwargs)
    155 
    156 

/opt/conda/lib/python3.7/site-packages/google/api_core/retry.py in retry_wrapped_func(*args, **kwargs)
    286                 sleep_generator,
    287                 self._deadline,
--> 288                 on_error=on_error,
    289             )
    290 

/opt/conda/lib/python3.7/site-packages/google/api_core/retry.py in retry_target(target, predicate, sleep_generator, deadline, on_error)
    188     for sleep in sleep_generator:
    189         try:
--> 190             return target()
    191 
    192         # pylint: disable=broad-except

/opt/conda/lib/python3.7/site-packages/google/api_core/grpc_helpers.py in error_remapped_callable(*args, **kwargs)
     57             return callable_(*args, **kwargs)
     58         except grpc.RpcError as exc:
---> 59             raise exceptions.from_grpc_error(exc) from exc
     60 
     61     return error_remapped_callable

InvalidArgument: 400 The language sq is not supported for document_sentiment analysis.

Proposed Solution建议的解决方案

I'm thinking the best possible solution must be to ignore any non-english language, and I'm wondering if that's a reasonable approach, and if someone has input on how to approach that.我认为最好的解决方案必须是忽略任何非英语语言，我想知道这是否是一种合理的方法，以及是否有人对如何处理该方法提出了意见。

Greatly appreciate any input on resolving this.非常感谢解决此问题的任何意见。 thx谢谢

Tweet Content Causing Problem导致问题的推文内容

Update| #shqip #shqiperi #kosova #albania #kosovo #shqiptar #shqiptare #lajme #shqiperia #tirana #prishtina #visitalbania #albanian #tirane #albaniangirl #shqipe…

Answer 1

The issue can be resolved by explicitly specifying document language in the code.该问题可以通过在代码中明确指定文档语言来解决。 ie. IE。 specify language en , define the “type_” then declare it on “document”.指定语言en ，定义“type_”然后在“document”上声明它。

For example:例如：

type_ = language_v1.Document.Type.PLAIN_TEXT
language = "en"
document = {"type_": type_, "content": content, "language": language}

Sample code:示例代码：

def sample_analyze_sentiment(content):
 
    client = language_v1.LanguageServiceClient()
 
    if isinstance(content, six.binary_type):
        content = content.decode("utf-8")
 
    type_ = language_v1.Document.Type.PLAIN_TEXT
    language = "en"
    document = {"type_": type_, "content": content, "language": language}
 
    response = client.analyze_sentiment(request={"document": document})
    sentiment = response.document_sentiment
    print("Score: {}".format(sentiment.score))
    print("Magnitude: {}".format(sentiment.magnitude))

不确定如何解决来自 Google 自然语言 api 的语言错误消息：“document_sentiment 分析不支持语言 sq。”

问题描述

1 个解决方案

解决方案1
0 2023-01-10 13:46:31

不确定如何解决来自 Google 自然语言 api 的语言错误消息：“document_sentiment 分析不支持语言 sq。”

问题描述

1 个解决方案

解决方案1 0 2023-01-10 13:46:31

解决方案1
0 2023-01-10 13:46:31