为什么在从 API 服务器调用 AWS Athena API 时出现 ThrottlingException - Rate Exceeded status:400？

Question

We have an S3 data lake in AWS (with Lake Formation, Glue etc.) The end goal is to query the S3 data sources using SQL in Athena.我们在 AWS 中有一个 S3 数据湖（带有 Lake Formation、Glue 等）。最终目标是在 Athena 中使用 SQL 查询 S3 数据源。

When making the query in the AWS Athena console - everything works fine, results are provided,see screenshot: https://share.getcloudapp.com/NQuNBr5g在 AWS Athena 控制台中进行查询时 - 一切正常，提供了结果，请参见屏幕截图： https://share.getcloudapp.com/NQuNBr5g
When making the query through the official API application domain (Symfony5 RESTful api that uses aws-sdk-php vendor), the query doesn't even get to Athena, error returned is 400: https://share.getcloudapp.com/xQuqQLrq通过官方 API 应用程序域（使用 aws-sdk-php 供应商的 Symfony5 RESTful api）进行查询时，查询甚至没有到达 Athena，返回的错误是 400： https://share.getcloudapp.com/xQuqQLrq
- in CloudTrail events, I can only see errorcode= ThrottlingException and errormessage='Rate exceeded', there's no query execution id.在 CloudTrail 事件中，我只能看到 errorcode= ThrottlingException 和 errormessage='Rate exceeded'，没有查询执行 ID。
Weird thing I don't get is, when making the same call in my localhost setup of the API app, the call is again successful: https://share.getcloudapp.com/jkuv8ZGy奇怪的是，当我在 API 应用程序的本地主机设置中进行相同的调用时，调用再次成功： https://share.getcloudapp.com/jkuv8ZGy

The call made is StartQueryExecution on Athena API, error as shown on the API app's side:调用的是 Athena API 上的 StartQueryExecution，错误如 API 应用端所示：

Error executing \"GetQueryExecution\" on \"https://athena.us-west-2.amazonaws.com\"; AWS HTTP error: Client error: `POST https://athena.us-west-2.amazonaws.com` resulted in a `400 Bad Request` response:\n{\"__type\":\"ThrottlingException\",\"message\":\"Rate exceeded\"}\n ThrottlingException (client): Rate exceeded - {\"__type\":\"ThrottlingException\",\"message\":\"Rate exceeded\"}", "class": "Aws\\Athena\\Exception\\AthenaException"

The API app server and the datalake etc. are on the same VPC, and I created a VPC endpoint from the server's VPC to athena us-west-2 endpoint, but it didn't help. API 应用程序服务器和数据湖等在同一个 VPC 上，我创建了一个从服务器的 VPC 到 athena us-west-2 端点的 VPC 端点，但它没有帮助。 I don't think it's Athena Quota issues, since on localhost the query works just fine.我不认为这是雅典娜配额问题，因为在本地主机上查询工作得很好。 Any insight would be very helpful, thank you!任何见解都会非常有帮助，谢谢！

Answer 1

The solution was a combination of actions.解决方案是行动的组合。 Athena just doesn't work like that.雅典娜不是那样工作的。 So it's not okay to expect data from an Athena query over an S3 datalake as if querying a relational database.因此，不能像查询关系数据库那样期望通过 S3 数据湖从 Athena 查询中获取数据。 What helped get results consistently and not have this error was:有助于始终如一地获得结果并且没有此错误的是：

update the PHP SDK AthenaClient constructor, and also pass config for retries .更新 PHP SDK AthenaClient 构造函数，并传递重试配置。

... other AthenaClient constructor params... ...其他 AthenaClient 构造函数参数...

'retries' => [
'mode' => 'standard',
'max_attempts' => 3
],

Athena and other elastic services (eg dynamodb) work asynchronously. Athena 和其他弹性服务（例如 dynamodb）异步工作。 You issue the query, but the result will not be delivered synchronously.您发出查询，但结果不会同步传递。 As example - I saw in my early tests always receiving the initial "throttlingException" but in Athena Query console, the result of that exact same query came slightly later, but successfully.例如——我在早期的测试中看到总是收到初始的“throttlingException”，但在 Athena 查询控制台中，完全相同的查询的结果稍晚出现，但成功了。 It looks like the PHP SDK for aws is done with this in mind so doing retries and exponential backoff is also what AWS recommends: https://docs.aws.amazon.com/general/latest/gr/api-retries.html看起来 aws 的 PHP SDK 已经考虑到这一点，所以重试和指数退避也是 AWS 推荐的： https://docs.aws.amazon.com/general/latest/gr/api-retries.html

Partition your data, and in a relevant way, in order to scan as less data as possible.以相关方式对数据进行分区，以便扫描尽可能少的数据。 Which helps with more consistent and faster results.这有助于获得更一致和更快的结果。 - https://docs.aws.amazon.com/athena/latest/ug/partitions.html // either on the glue table directly, or via Glue ETL job where partitioning keys are specified. - https://docs.aws.amazon.com/athena/latest/ug/partitions.html // 直接在粘合表上，或通过指定分区键的 Glue ETL 作业。 If your query on athena is looking for something where country={country}, a good partitioning scheme is per country.如果您对 athena 的查询正在寻找 country={country} 的内容，那么一个好的分区方案是按国家/地区划分的。
avoid 'select *' - always name exactly the columns needed + add limit + queries over Athena should be relatively simple select queries, if you need joins or other more complex query types, Redshift is better suited for that.避免“select *”——始终准确命名所需的列+添加限制+在 Athena 上的查询应该相对简单 select 查询，如果您需要连接或其他更复杂的查询类型，Redshift 更适合。

为什么在从 API 服务器调用 AWS Athena API 时出现 ThrottlingException - Rate Exceeded status:400？

问题描述

1 个解决方案

解决方案1
1 已采纳 2022-05-03 13:23:50

为什么在从 API 服务器调用 AWS Athena API 时出现 ThrottlingException - Rate Exceeded status:400？

问题描述

1 个解决方案

解决方案1 1 已采纳 2022-05-03 13:23:50

解决方案1
1 已采纳 2022-05-03 13:23:50