[英]Unsure how to resolve language error message from Google's natural language api: "The language sq is not supported for document_sentiment analysis."
I have an app that's been working for months and is now giving me an error.我有一个已经运行了几个月的应用程序,现在给我一个错误。
The app takes tweets from the Twitter API and runs them through Google's Sentiment Analysis API, returning sentiment analysis on each of the tweets.该应用程序从 Twitter API 获取推文,并通过谷歌的情绪分析 API 运行它们,返回对每条推文的情绪分析。
Without changing the code, I'm suddenly getting a error that hasn't happened before.在不更改代码的情况下,我突然收到一个以前没有发生过的错误。
Error message错误信息
_InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
status = StatusCode.INVALID_ARGUMENT
details = "The language sq is not supported for document_sentiment analysis."
debug_error_string = "UNKNOWN:Error received from peer ipv4:74.125.69.95:443 {grpc_message:"The language sq is not supported for document_sentiment analysis.", grpc_status:3, created_time:"2022-12-24T04:26:31.031735656+00:00"}"
>
Interpretation解释
Even though I'm stating only 'english' language tweets in my Twitter API query ( -is:retweet lang:en
), my understanding of the error messsage is that the NL API is thinking this is some language referred to as sq
.尽管我在我的 Twitter API 查询 (
-is:retweet lang:en
) 中仅声明了“英语”语言推文,但我对错误消息的理解是 NL API 认为这是某种称为sq
的语言。 My research says that's 'Albanian'.我的研究表明那是“阿尔巴尼亚语”。
So my assumption is that the NL API is interpreting some block(s) of text in the tweets as being in Albanian, or maybe it's just a portion of an otherwise english tweet that has some Albanian language in it.所以我的假设是 NL API 将推文中的某些文本块解释为阿尔巴尼亚语,或者它可能只是包含阿尔巴尼亚语的其他英语推文的一部分。
Solution解决方案
Is there a way to ignore or skip a text if the API can't process the language the text is in?如果 API 无法处理文本所使用的语言,是否可以忽略或跳过文本?
This is the language_v1
call:这是
language_v1
调用:
def get_single_sentiment(text):
'''gets non-entity sentiment of text using GCP's api'''
# Instantiates a client
client = language_v1.LanguageServiceClient()
# The text to analyze
document = language_v1.Document(content = text , type_=language_v1.types.Document.Type.PLAIN_TEXT)
# Detects the sentiment of the text
sentiment = client.analyze_sentiment(request={"document": document}).document_sentiment
return sentiment
Below is the full error message being returned when trying to run the sentiment analysis:以下是尝试运行情绪分析时返回的完整错误消息:
---------------------------------------------------------------------------
_InactiveRpcError Traceback (most recent call last)
/opt/conda/lib/python3.7/site-packages/google/api_core/grpc_helpers.py in error_remapped_callable(*args, **kwargs)
56 try:
---> 57 return callable_(*args, **kwargs)
58 except grpc.RpcError as exc:
/opt/conda/lib/python3.7/site-packages/grpc/_channel.py in __call__(self, request, timeout, metadata, credentials, wait_for_ready, compression)
945 wait_for_ready, compression)
--> 946 return _end_unary_response_blocking(state, call, False, None)
947
/opt/conda/lib/python3.7/site-packages/grpc/_channel.py in _end_unary_response_blocking(state, call, with_call, deadline)
848 else:
--> 849 raise _InactiveRpcError(state)
850
_InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
status = StatusCode.INVALID_ARGUMENT
details = "The language sq is not supported for document_sentiment analysis."
debug_error_string = "UNKNOWN:Error received from peer ipv4:74.125.69.95:443 {grpc_message:"The language sq is not supported for document_sentiment analysis.", grpc_status:3, created_time:"2022-12-24T04:26:31.031735656+00:00"}"
>
The above exception was the direct cause of the following exception:
InvalidArgument Traceback (most recent call last)
/tmp/ipykernel_1/1103340548.py in <module>
1 twitter_stage(QUERY_TW, N_HOURS_AGO
----> 2 , TWITTER_BQ_TABLE, ENTITY)
/tmp/ipykernel_1/2800777156.py in twitter_stage(QUERY, N_HOURS_AGO, TWITTER_BQ_TABLE, ENTITY)
39
40 # get sentiment analysis
---> 41 twitapi_df = get_column_sentiment(twitapi_df, text_col='text', entity=ENTITY, query=QUERY)
42
43 # Dropping columns that can't be saved to big query because they are not compatible
/tmp/ipykernel_1/2183820933.py in get_column_sentiment(df, text_col, entity, query)
110
111 # for each entry in text_col, get a single sentiment result
--> 112 sentiment_column = df[text_col].apply(f)
113
114 # for each entry in sentiment_column, fix null values (replace nulls will two values)
/opt/conda/lib/python3.7/site-packages/pandas/core/series.py in apply(self, func, convert_dtype, args, **kwargs)
4355 dtype: float64
4356 """
-> 4357 return SeriesApply(self, func, convert_dtype, args, kwargs).apply()
4358
4359 def _reduce(
/opt/conda/lib/python3.7/site-packages/pandas/core/apply.py in apply(self)
1041 return self.apply_str()
1042
-> 1043 return self.apply_standard()
1044
1045 def agg(self):
/opt/conda/lib/python3.7/site-packages/pandas/core/apply.py in apply_standard(self)
1099 values,
1100 f, # type: ignore[arg-type]
-> 1101 convert=self.convert_dtype,
1102 )
1103
/opt/conda/lib/python3.7/site-packages/pandas/_libs/lib.pyx in pandas._libs.lib.map_infer()
/tmp/ipykernel_1/2183820933.py in get_single_sentiment(text)
16
17 # Detects the sentiment of the text
---> 18 sentiment = client.analyze_sentiment(request={"document": document}).document_sentiment
19
20 return sentiment
/opt/conda/lib/python3.7/site-packages/google/cloud/language_v1/services/language_service/client.py in analyze_sentiment(self, request, document, encoding_type, retry, timeout, metadata)
509 retry=retry,
510 timeout=timeout,
--> 511 metadata=metadata,
512 )
513
/opt/conda/lib/python3.7/site-packages/google/api_core/gapic_v1/method.py in __call__(self, timeout, retry, *args, **kwargs)
152 kwargs["metadata"] = metadata
153
--> 154 return wrapped_func(*args, **kwargs)
155
156
/opt/conda/lib/python3.7/site-packages/google/api_core/retry.py in retry_wrapped_func(*args, **kwargs)
286 sleep_generator,
287 self._deadline,
--> 288 on_error=on_error,
289 )
290
/opt/conda/lib/python3.7/site-packages/google/api_core/retry.py in retry_target(target, predicate, sleep_generator, deadline, on_error)
188 for sleep in sleep_generator:
189 try:
--> 190 return target()
191
192 # pylint: disable=broad-except
/opt/conda/lib/python3.7/site-packages/google/api_core/grpc_helpers.py in error_remapped_callable(*args, **kwargs)
57 return callable_(*args, **kwargs)
58 except grpc.RpcError as exc:
---> 59 raise exceptions.from_grpc_error(exc) from exc
60
61 return error_remapped_callable
InvalidArgument: 400 The language sq is not supported for document_sentiment analysis.
Proposed Solution建议的解决方案
I'm thinking the best possible solution must be to ignore any non-english language, and I'm wondering if that's a reasonable approach, and if someone has input on how to approach that.我认为最好的解决方案必须是忽略任何非英语语言,我想知道这是否是一种合理的方法,以及是否有人对如何处理该方法提出了意见。
Greatly appreciate any input on resolving this.非常感谢解决此问题的任何意见。 thx
谢谢
Tweet Content Causing Problem导致问题的推文内容
Update| #shqip #shqiperi #kosova #albania #kosovo #shqiptar #shqiptare #lajme #shqiperia #tirana #prishtina #visitalbania #albanian #tirane #albaniangirl #shqipe…
The issue can be resolved by explicitly specifying document language in the code.该问题可以通过在代码中明确指定文档语言来解决。 ie.
IE。 specify language
en
, define the “type_” then declare it on “document”.指定语言
en
,定义“type_”然后在“document”上声明它。
For example:例如:
type_ = language_v1.Document.Type.PLAIN_TEXT
language = "en"
document = {"type_": type_, "content": content, "language": language}
Sample code:示例代码:
def sample_analyze_sentiment(content):
client = language_v1.LanguageServiceClient()
if isinstance(content, six.binary_type):
content = content.decode("utf-8")
type_ = language_v1.Document.Type.PLAIN_TEXT
language = "en"
document = {"type_": type_, "content": content, "language": language}
response = client.analyze_sentiment(request={"document": document})
sentiment = response.document_sentiment
print("Score: {}".format(sentiment.score))
print("Magnitude: {}".format(sentiment.magnitude))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.