简体   繁体   中英

AWS Athena output result.json to s3 - CREATE TABLE AS / INSERT INTO SELECT?

Is it anyhow possible to write the results of an AWS Athena query to a results.json within an s3 bucket?

My first idea was to use INSERT INTO SELECT ID, COUNT(*) ... or INSERT OVERWRITE but this seems not be supported according Amazon Athena DDL Statements and tdhoppers Blogpost

  1. Is it anyhow possible to CREATE TABLE with new data with AWS Athena?
  2. Is there any work around with AWS Glue?
  3. Anyhow possible to trigger an lambda function with the results of Athena? (I'm aware of S3 Hooks)

It would not matter to me to overwrite the whole json file / table and always create a new json, since it is very limited statistics I aggregate.

I do know AWS Athena automatically writes the results to an S3 bucket as CSV. However I like to do simple aggregations and write the outputs directly to a public s3 so that an spa angular application in the browser is able to read it. Thus JSON Format and a specific path is important to me.

The work around for me with glue. Use Athena jdbc driver for running the query and load result in a dataframe. Then save the dataframe as the required format on specified S3 location.

df=spark.read.format('jdbc').options(url='jdbc:awsathena://AwsRegion=region;UID=your-access-key;PWD=your-secret-access-key;Schema=database name;S3OutputLocation=s3 location where jdbc drivers stores athena query results',
      driver='com.simba.athena.jdbc42.Driver',
      dbtable='(your athena query)').load()
df.repartition(1).write.format("json").save("s3 location")

Specify query in format dbtable='(select * from foo)'

Download jar from here and store it in S3. While configuring etl job on glue specify s3 location for jar in Jar lib path.

you can get Athena to create data in s3 by using a "create table as select" (CTAS) query. In that query you can specify where and in what format you want the created table to store its data. https://docs.aws.amazon.com/athena/latest/ug/ctas-examples.html For json, the example you are looking for is:

CREATE TABLE ctas_json_unpartitioned 
WITH (
     format = 'JSON',  
     external_location = 's3://my_athena_results/ctas_json_unpartitioned/') 
AS SELECT key1, name1, address1, comment1
FROM table1;

this would result in single lines json format

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM