[英]Why do I get ThrottlingException - Rate Exceeded status:400 when making AWS Athena API call from API server?
We have an S3 data lake in AWS (with Lake Formation, Glue etc.) The end goal is to query the S3 data sources using SQL in Athena.我们在 AWS 中有一个 S3 数据湖(带有 Lake Formation、Glue 等)。最终目标是在 Athena 中使用 SQL 查询 S3 数据源。
The call made is StartQueryExecution on Athena API, error as shown on the API app's side:调用的是 Athena API 上的 StartQueryExecution,错误如 API 应用端所示:
Error executing \"GetQueryExecution\" on \"https://athena.us-west-2.amazonaws.com\"; AWS HTTP error: Client error: `POST https://athena.us-west-2.amazonaws.com` resulted in a `400 Bad Request` response:\n{\"__type\":\"ThrottlingException\",\"message\":\"Rate exceeded\"}\n ThrottlingException (client): Rate exceeded - {\"__type\":\"ThrottlingException\",\"message\":\"Rate exceeded\"}", "class": "Aws\\Athena\\Exception\\AthenaException"
The API app server and the datalake etc. are on the same VPC, and I created a VPC endpoint from the server's VPC to athena us-west-2 endpoint, but it didn't help. API 应用程序服务器和数据湖等在同一个 VPC 上,我创建了一个从服务器的 VPC 到 athena us-west-2 端点的 VPC 端点,但它没有帮助。 I don't think it's Athena Quota issues, since on localhost the query works just fine.
我不认为这是雅典娜配额问题,因为在本地主机上查询工作得很好。 Any insight would be very helpful, thank you!
任何见解都会非常有帮助,谢谢!
The solution was a combination of actions.解决方案是行动的组合。 Athena just doesn't work like that.
雅典娜不是那样工作的。 So it's not okay to expect data from an Athena query over an S3 datalake as if querying a relational database.
因此,不能像查询关系数据库那样期望通过 S3 数据湖从 Athena 查询中获取数据。 What helped get results consistently and not have this error was:
有助于始终如一地获得结果并且没有此错误的是:
... other AthenaClient constructor params... ...其他 AthenaClient 构造函数参数...
'retries' => [
'mode' => 'standard',
'max_attempts' => 3
],
Partition your data, and in a relevant way, in order to scan as less data as possible.以相关方式对数据进行分区,以便扫描尽可能少的数据。 Which helps with more consistent and faster results.
这有助于获得更一致和更快的结果。 - https://docs.aws.amazon.com/athena/latest/ug/partitions.html // either on the glue table directly, or via Glue ETL job where partitioning keys are specified.
- https://docs.aws.amazon.com/athena/latest/ug/partitions.html // 直接在粘合表上,或通过指定分区键的 Glue ETL 作业。 If your query on athena is looking for something where country={country}, a good partitioning scheme is per country.
如果您对 athena 的查询正在寻找 country={country} 的内容,那么一个好的分区方案是按国家/地区划分的。
avoid 'select *' - always name exactly the columns needed + add limit + queries over Athena should be relatively simple select queries, if you need joins or other more complex query types, Redshift is better suited for that.避免“select *”——始终准确命名所需的列+添加限制+在 Athena 上的查询应该相对简单 select 查询,如果您需要连接或其他更复杂的查询类型,Redshift 更适合。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.