简体   繁体   English

为什么在从 API 服务器调用 AWS Athena API 时出现 ThrottlingException - Rate Exceeded status:400?

[英]Why do I get ThrottlingException - Rate Exceeded status:400 when making AWS Athena API call from API server?

We have an S3 data lake in AWS (with Lake Formation, Glue etc.) The end goal is to query the S3 data sources using SQL in Athena.我们在 AWS 中有一个 S3 数据湖(带有 Lake Formation、Glue 等)。最终目标是在 Athena 中使用 SQL 查询 S3 数据源。

  • When making the query in the AWS Athena console - everything works fine, results are provided,see screenshot: https://share.getcloudapp.com/NQuNBr5g在 AWS Athena 控制台中进行查询时 - 一切正常,提供了结果,请参见屏幕截图: https://share.getcloudapp.com/NQuNBr5g
  • When making the query through the official API application domain (Symfony5 RESTful api that uses aws-sdk-php vendor), the query doesn't even get to Athena, error returned is 400: https://share.getcloudapp.com/xQuqQLrq通过官方 API 应用程序域(使用 aws-sdk-php 供应商的 Symfony5 RESTful api)进行查询时,查询甚至没有到达 Athena,返回的错误是 400: https://share.getcloudapp.com/xQuqQLrq
    • in CloudTrail events, I can only see errorcode= ThrottlingException and errormessage='Rate exceeded', there's no query execution id.在 CloudTrail 事件中,我只能看到 errorcode= ThrottlingException 和 errormessage='Rate exceeded',没有查询执行 ID。
  • Weird thing I don't get is, when making the same call in my localhost setup of the API app, the call is again successful: https://share.getcloudapp.com/jkuv8ZGy奇怪的是,当我在 API 应用程序的本地主机设置中进行相同的调用时,调用再次成功: https://share.getcloudapp.com/jkuv8ZGy

The call made is StartQueryExecution on Athena API, error as shown on the API app's side:调用的是 Athena API 上的 StartQueryExecution,错误如 API 应用端所示:

Error executing \"GetQueryExecution\" on \"https://athena.us-west-2.amazonaws.com\"; AWS HTTP error: Client error: `POST https://athena.us-west-2.amazonaws.com` resulted in a `400 Bad Request` response:\n{\"__type\":\"ThrottlingException\",\"message\":\"Rate exceeded\"}\n ThrottlingException (client): Rate exceeded - {\"__type\":\"ThrottlingException\",\"message\":\"Rate exceeded\"}", "class": "Aws\\Athena\\Exception\\AthenaException"

The API app server and the datalake etc. are on the same VPC, and I created a VPC endpoint from the server's VPC to athena us-west-2 endpoint, but it didn't help. API 应用程序服务器和数据湖等在同一个 VPC 上,我创建了一个从服务器的 VPC 到 athena us-west-2 端点的 VPC 端点,但它没有帮助。 I don't think it's Athena Quota issues, since on localhost the query works just fine.我不认为这是雅典娜配额问题,因为在本地主机上查询工作得很好。 Any insight would be very helpful, thank you!任何见解都会非常有帮助,谢谢!

The solution was a combination of actions.解决方案是行动的组合。 Athena just doesn't work like that.雅典娜不是那样工作的。 So it's not okay to expect data from an Athena query over an S3 datalake as if querying a relational database.因此,不能像查询关系数据库那样期望通过 S3 数据湖从 Athena 查询中获取数据。 What helped get results consistently and not have this error was:有助于始终如一地获得结果并且没有此错误的是:

  1. update the PHP SDK AthenaClient constructor, and also pass config for retries .更新 PHP SDK AthenaClient 构造函数,并传递重试配置。

... other AthenaClient constructor params... ...其他 AthenaClient 构造函数参数...

'retries' => [
'mode' => 'standard',
'max_attempts' => 3
],
  • Athena and other elastic services (eg dynamodb) work asynchronously. Athena 和其他弹性服务(例如 dynamodb)异步工作。 You issue the query, but the result will not be delivered synchronously.您发出查询,但结果不会同步传递。 As example - I saw in my early tests always receiving the initial "throttlingException" but in Athena Query console, the result of that exact same query came slightly later, but successfully.例如——我在早期的测试中看到总是收到初始的“throttlingException”,但在 Athena 查询控制台中,完全相同的查询的结果稍晚出现,但成功了。 It looks like the PHP SDK for aws is done with this in mind so doing retries and exponential backoff is also what AWS recommends: https://docs.aws.amazon.com/general/latest/gr/api-retries.html看起来 aws 的 PHP SDK 已经考虑到这一点,所以重试和指数退避也是 AWS 推荐的: https://docs.aws.amazon.com/general/latest/gr/api-retries.html
  1. Partition your data, and in a relevant way, in order to scan as less data as possible.以相关方式对数据进行分区,以便扫描尽可能少的数据。 Which helps with more consistent and faster results.这有助于获得更一致和更快的结果。 - https://docs.aws.amazon.com/athena/latest/ug/partitions.html // either on the glue table directly, or via Glue ETL job where partitioning keys are specified. - https://docs.aws.amazon.com/athena/latest/ug/partitions.html // 直接在粘合表上,或通过指定分区键的 Glue ETL 作业。 If your query on athena is looking for something where country={country}, a good partitioning scheme is per country.如果您对 athena 的查询正在寻找 country={country} 的内容,那么一个好的分区方案是按国家/地区划分的。

  2. avoid 'select *' - always name exactly the columns needed + add limit + queries over Athena should be relatively simple select queries, if you need joins or other more complex query types, Redshift is better suited for that.避免“select *”——始终准确命名所需的列+添加限制+在 Athena 上的查询应该相对简单 select 查询,如果您需要连接或其他更复杂的查询类型,Redshift 更适合。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 获取 ThrottlingException:超出速率,状态代码:AWS API 上的 400 - Getting ThrottlingException: Rate exceeded, status code: 400 on AWS API 调用 GetDeployment 操作时发生错误 (ThrottlingException)(达到最大重试次数:4):超出速率 - An error occurred (ThrottlingException) when calling the GetDeployment operation (reached max retries: 4): Rate exceeded AWS Athena 对于 api 来说太慢了吗? - AWS Athena too slow for an api? AWS 雅典娜数据源 API - AWS Athena Data Sources API 当我添加更多 JSON 属性时,SendGrid API 返回 400 - SendGrid API returning 400 when I add more JSON properties 为什么在通过 API 网关调用时,Java 中的 AWS Lambda 代码返回“内部服务器错误”? - why does this AWS Lambda code in Java return "internal server error" when invoked via an API gateway? 为什么在 rest api 中定义的新资源没有在 wso2 api 管理器中同步,当我调用资源面 404 未找到错误时? - why new resources defined in rest api doesn't get synced in wso2 api manger and when I call the resource face 404 not found error? AWS API 网关 - 使用 C# 调用 GET 方法 SDK - AWS API Gateway - Call GET Method with C# SDK 为什么我从 fetch 调用中得到空响应? - Why do I get an empty response from fetch call? 如何获取在 s3 中上传的最新 object 名称并使用 AWS Lambda 将该数据(json)推送到不同的 api(点击 api)? - How do I get the latest object name uploaded in s3 and push that data(json) to a different api (hit an api) using AWS Lambda?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM