簡體 English 中英

在通過 boto3 發送到 AWS Comprehend 之前如何按大小過濾文檔？

[英]How Do You Filter Documents by Size Before Sending to AWS Comprehend via boto3?

原文 2019-10-11 15:11:36 2 1 python/ amazon-web-services/ boto3/ amazon-comprehend

我目前正在嘗試使用 boto3 庫通過 AWS 的 Comprehend 服務對一組文檔執行批量情緒分析。 該服務對文檔大小有一些限制（文檔不能超過 5000 字節）； 因此，我嘗試在使用 boto3 API 之前對文檔進行預過濾。 請參閱下面的代碼片段：

...
batch = []
for doc in docs:
    if isinstance(doc, str) and len(doc) > 0 and sys.getsizeof(doc) < 5000:
        batch.append(doc)

data = self.client.batch_detect_sentiment(TextList=batch, LanguageCode=language)
...

我的假設是，嘗試使用sys.getsizeof過濾文檔會導致過濾掉任何 go 超出服務 5000 字節限制的字符串。 但是，我的過濾仍然收到以下異常：

botocore.errorfactory.TextSizeLimitExceededException: An error occurred (TextSizeLimitExceededException) when calling the BatchDetectSentiment operation: Input text size exceeds limit. Max length of request text allowed is 5000 bytes while in this request the text size is 5523 bytes

為了避免達到最大文檔大小限制，是否有更有效的方法來計算發送到 Comprehend 的文檔大小？

1 個解決方案

這里有兩種方法：

正如丹尼爾所提到的，您可以使用len(doc.encode('utf-8'))來確定字符串的結束大小，因為它考慮了編碼，而不僅僅是 memory python 字符串 ZA8CFDE6331BD59EB2AC96F8911C4 需要多少。
您可以在異常發生時處理它。 就像這樣：

try:
    data = self.client.batch_detect_sentiment(TextList=batch, LanguageCode=language)
except self.client.exceptions.TextSizeLimitExceededException:
    print('The batch was too long')
else:
    print(data)

如何為 AWS SDK for Python boto3 指定區域

[英]How do you specify region for AWS SDK for Python boto3

如何通過 boto3 獲取 AWS 的配額？

[英]How to get quotas of AWS via boto3?

您如何從 ~/.aws/.credentials 和 ~/.aws/.config 文件中使用 boto3 列出本地配置文件？

[英]How do you list local profiles with boto3 from ~/.aws/.credentials and ~/.aws/.config files?

如何在AWS KMS中使用Boto3 download_file？

[英]How do you use Boto3 download_file with AWS KMS?

如何在 aws dynamodb boto3 上正確使用日期過濾器

[英]How to use date filter correctly on aws dynamodb boto3

如何通過boto3獲取aws卷可用大小

[英]How can i get the aws volumes available size by boto3

如何在boto3中模擬AWS CLI EC2過濾器

[英]How to mimic AWS CLI EC2 filter in boto3

如何通過 boto3 獲取 AWS EC2 的配額？

[英]How to get quotas of AWS EC2 via boto3?

通過 Boto3 更新 AWS 中的路由表

[英]Updating Route Table in AWS via Boto3

如何獲得boto3 Collection的大小？

[英]How do I get the size of a boto3 Collection?

暫無

暫無

聲明:本站的技術帖子網頁，遵循CC BY-SA 4.0協議，如果您需要轉載，請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

相關問題 如何為 AWS SDK for Python boto3 指定區域如何通過 boto3 獲取 AWS 的配額？您如何從 ~/.aws/.credentials 和 ~/.aws/.config 文件中使用 boto3 列出本地配置文件？如何在AWS KMS中使用Boto3 download_file？如何在 aws dynamodb boto3 上正確使用日期過濾器如何通過boto3獲取aws卷可用大小如何在boto3中模擬AWS CLI EC2過濾器如何通過 boto3 獲取 AWS EC2 的配額？通過 Boto3 更新 AWS 中的路由表如何獲得boto3 Collection的大小？

相關標簽

粵ICP備18138465號 © 2020-2024 STACKOOM.COM