AWS Athena将result.json输出到s3-创建表AS /插入选择吗？

Question

Is it anyhow possible to write the results of an AWS Athena query to a results.json within an s3 bucket? 是否有可能将AWS Athena查询的结果写入s3存储桶中的results.json？

My first idea was to use INSERT INTO SELECT ID, COUNT(*) ... or INSERT OVERWRITE but this seems not be supported according Amazon Athena DDL Statements and tdhoppers Blogpost 我的第一个想法是使用INSERT INTO SELECT ID, COUNT(*) ...或INSERT OVERWRITE但这似乎不受Amazon Athena DDL语句和tdhoppers博客的支持。

Is it anyhow possible to CREATE TABLE with new data with AWS Athena? 使用AWS Athena可以用新数据CREATE TABLE吗？
Is there any work around with AWS Glue? AWS Glue可以解决吗？
Anyhow possible to trigger an lambda function with the results of Athena? 用雅典娜的结果触发lambda函数有什么可能？ (I'm aware of S3 Hooks) （我知道S3挂钩）

It would not matter to me to overwrite the whole json file / table and always create a new json, since it is very limited statistics I aggregate. 对我来说，覆盖整个json文件/表并始终创建一个新的json并不重要，因为我汇总的统计信息非常有限。

I do know AWS Athena automatically writes the results to an S3 bucket as CSV. 我知道AWS Athena会自动将结果以CSV格式写入S3存储桶。 However I like to do simple aggregations and write the outputs directly to a public s3 so that an spa angular application in the browser is able to read it. 但是，我喜欢进行简单的聚合并将输出直接写到公共s3，以便浏览器中的spa角度应用程序能够读取它。 Thus JSON Format and a specific path is important to me. 因此，JSON格式和特定路径对我很重要。

Answer 1

The work around for me with glue. 用胶水为我工作。 Use Athena jdbc driver for running the query and load result in a dataframe. 使用Athena jdbc驱动程序运行查询并将结果加载到数据框中。 Then save the dataframe as the required format on specified S3 location. 然后在指定的S3位置将数据框保存为所需格式。

df=spark.read.format('jdbc').options(url='jdbc:awsathena://AwsRegion=region;UID=your-access-key;PWD=your-secret-access-key;Schema=database name;S3OutputLocation=s3 location where jdbc drivers stores athena query results',
      driver='com.simba.athena.jdbc42.Driver',
      dbtable='(your athena query)').load()
df.repartition(1).write.format("json").save("s3 location")

Specify query in format dbtable='(select * from foo)' 指定查询的格式为dbtable ='（从foo中选择*）'

Download jar from here and store it in S3. 从此处下载jar并将其存储在S3中。 While configuring etl job on glue specify s3 location for jar in Jar lib path. 在胶水上配置etl作业时，请在Jar lib路径中为jar指定s3位置。

Answer 2

you can get Athena to create data in s3 by using a "create table as select" (CTAS) query. 您可以通过使用“选择时创建表”（CTAS）查询让Athena在s3中创建数据。 In that query you can specify where and in what format you want the created table to store its data. 在该查询中，您可以指定希望创建的表在何处以什么格式存储其数据。 https://docs.aws.amazon.com/athena/latest/ug/ctas-examples.html For json, the example you are looking for is: https://docs.aws.amazon.com/athena/latest/ug/ctas-examples.html对于json，您要查找的示例是：

CREATE TABLE ctas_json_unpartitioned 
WITH (
     format = 'JSON',  
     external_location = 's3://my_athena_results/ctas_json_unpartitioned/') 
AS SELECT key1, name1, address1, comment1
FROM table1;

this would result in single lines json format 这将导致单行json格式

AWS Athena将result.json输出到s3-创建表AS /插入选择吗？

问题描述

2 个解决方案

解决方案1
2 2018-01-13 19:31:44

解决方案2
0 2019-03-01 13:36:03

AWS Athena将result.json输出到s3-创建表AS /插入选择吗？

问题描述

2 个解决方案

解决方案1 2 2018-01-13 19:31:44

解决方案2 0 2019-03-01 13:36:03

解决方案1
2 2018-01-13 19:31:44

解决方案2
0 2019-03-01 13:36:03