如何在 AWS Athena 中对表进行分区？

Question

I'm trying to build an application that uses Athena to present Cloudtrail logs.我正在尝试构建一个使用 Athena 来呈现 Cloudtrail 日志的应用程序。 a simple query takes around 4 mins.一个简单的查询大约需要 4 分钟。 it scans around 200 GB.它扫描大约 200 GB。 What I'm looking for is: (at least one of those fields is used in the query) [ Event name, Event time, User name, Event source, Resource type, Resource name ]我正在寻找的是：（查询中至少使用了这些字段之一）[事件名称、事件时间、用户名、事件源、资源类型、资源名称]

eg例如

SELECT eventid,
    eventname,
    eventsource,
    resources [ 1 ].arn,
    resources [ 1 ].type,
    useridentity.username
FROM My_Table
WHERE useridentity.username = 'username'
    AND eventtime BETWEEN '2022-07-10T13:14' AND '2022-07-27T13:14'

How can I optimize the query time?如何优化查询时间？ I read Top 10 Performance Tuning Tips for Amazon Athena .我阅读了 Amazon Athena 的 10 大性能调优技巧。 I'm trying to partition the data, but all the articles and examples I found are not much of a help.我正在尝试对数据进行分区，但是我发现的所有文章和示例都没有太大帮助。

Can someone please provide me with a way to partition my data?有人可以为我提供一种分区数据的方法吗？ Or if there's another way to accelerate the performance.或者，如果有另一种方法可以加速性能。 (I already have a table, so I want to ALTER it, my S3 bucket URI is something like this: XXXXXXXXXXXXXXX/us-east-1/2022/07/25/ （我已经有一张桌子，所以我想ALTER它，我的 S3 存储桶 URI 是这样的： XXXXXXXXXXXXXXX/us-east-1/2022/07/25/

Answer 1

I would recommend you first create a table in Snappy-compressed Parquet format.我建议您首先以 Snappy 压缩 Parquet 格式创建一个表。

You can create a new table from the existing table and convert formats.您可以从现有表创建新表并转换格式。

From Examples of CTAS queries - Amazon Athena :来自CTAS 查询示例 - Amazon Athena ：

CREATE TABLE new_table
WITH (
      format = 'Parquet',
      write_compression = 'SNAPPY',
      external_location ='s3://my-bucket/tables/parquet_table/')
AS SELECT *
FROM old_table;

Note that this will create new data files in the location specified.请注意，这将在指定的位置创建新的数据文件。 Keep in mind that deleting the table in Amazon Athena will not delete the data in the Amazon S3 bucket.请记住，删除 Amazon Athena 中的表不会删除 Amazon S3 存储桶中的数据。

You can then compare performance using new_table .然后，您可以使用new_table比较性能。

If you then want to add partitioning, run the same command with partitioned_by .如果您想添加分区，请使用partitioned_by运行相同的命令。

For details, see: CREATE TABLE AS - Amazon Athena有关详细信息，请参阅：创建表作为 - Amazon Athena

如何在 AWS Athena 中对表进行分区？

问题描述

1 个解决方案

解决方案1
2 已采纳 2022-07-31 22:28:23

如何在 AWS Athena 中对表进行分区？

问题描述

1 个解决方案

解决方案1 2 已采纳 2022-07-31 22:28:23

解决方案1
2 已采纳 2022-07-31 22:28:23