简体   繁体   English

AWS Athena将result.json输出到s3-创建表AS /插入选择吗?

[英]AWS Athena output result.json to s3 - CREATE TABLE AS / INSERT INTO SELECT?

Is it anyhow possible to write the results of an AWS Athena query to a results.json within an s3 bucket? 是否有可能将AWS Athena查询的结果写入s3存储桶中的results.json?

My first idea was to use INSERT INTO SELECT ID, COUNT(*) ... or INSERT OVERWRITE but this seems not be supported according Amazon Athena DDL Statements and tdhoppers Blogpost 我的第一个想法是使用INSERT INTO SELECT ID, COUNT(*) ...INSERT OVERWRITE但这似乎不受Amazon Athena DDL语句tdhoppers博客的支持。

  1. Is it anyhow possible to CREATE TABLE with new data with AWS Athena? 使用AWS Athena可以用新数据CREATE TABLE吗?
  2. Is there any work around with AWS Glue? AWS Glue可以解决吗?
  3. Anyhow possible to trigger an lambda function with the results of Athena? 用雅典娜的结果触发lambda函数有什么可能? (I'm aware of S3 Hooks) (我知道S3挂钩)

It would not matter to me to overwrite the whole json file / table and always create a new json, since it is very limited statistics I aggregate. 对我来说,覆盖整个json文件/表并始终创建一个新的json并不重要,因为我汇总的统计信息非常有限。

I do know AWS Athena automatically writes the results to an S3 bucket as CSV. 我知道AWS Athena会自动将结果以CSV格式写入S3存储桶。 However I like to do simple aggregations and write the outputs directly to a public s3 so that an spa angular application in the browser is able to read it. 但是,我喜欢进行简单的聚合并将输出直接写到公共s3,以便浏览器中的spa角度应用程序能够读取它。 Thus JSON Format and a specific path is important to me. 因此,JSON格式和特定​​路径对我很重要。

The work around for me with glue. 用胶水为我工作。 Use Athena jdbc driver for running the query and load result in a dataframe. 使用Athena jdbc驱动程序运行查询并将结果加载到数据框中。 Then save the dataframe as the required format on specified S3 location. 然后在指定的S3位置将数据框保存为所需格式。

df=spark.read.format('jdbc').options(url='jdbc:awsathena://AwsRegion=region;UID=your-access-key;PWD=your-secret-access-key;Schema=database name;S3OutputLocation=s3 location where jdbc drivers stores athena query results',
      driver='com.simba.athena.jdbc42.Driver',
      dbtable='(your athena query)').load()
df.repartition(1).write.format("json").save("s3 location")

Specify query in format dbtable='(select * from foo)' 指定查询的格式为dbtable ='(从foo中选择*)'

Download jar from here and store it in S3. 此处下载jar并将其存储在S3中。 While configuring etl job on glue specify s3 location for jar in Jar lib path. 在胶水上配置etl作业时,请在Jar lib路径中为jar指定s3位置。

you can get Athena to create data in s3 by using a "create table as select" (CTAS) query. 您可以通过使用“选择时创建表”(CTAS)查询让Athena在s3中创建数据。 In that query you can specify where and in what format you want the created table to store its data. 在该查询中,您可以指定希望创建的表在何处以什么格式存储其数据。 https://docs.aws.amazon.com/athena/latest/ug/ctas-examples.html For json, the example you are looking for is: https://docs.aws.amazon.com/athena/latest/ug/ctas-examples.html对于json,您要查找的示例是:

CREATE TABLE ctas_json_unpartitioned 
WITH (
     format = 'JSON',  
     external_location = 's3://my_athena_results/ctas_json_unpartitioned/') 
AS SELECT key1, name1, address1, comment1
FROM table1;

this would result in single lines json format 这将导致单行json格式

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Amazon athena 无法读取 S3 JSON 对象文件,并且 Athena 选择查询返回 JSON 键列的空结果集 - Amazon athena can't read S3 JSON Object files and Athena select query returns empty result sets for JSON key columns aws athena-通过json对象数组创建表 - aws athena - Create table by an array of json object 从 json 事件创建 AWS Athena 表 - Create AWS Athena table from json event AWS Athena 使用嵌套的 json 创建表 - AWS Athena create table with nested json 带有一些 JSON 文件的 S3 存储桶上的 AWS Athena - AWS Athena on S3 bucket with some JSON files 是否可以从 AWS Athena 中的嵌套 json object 创建平面表? - Is it possible to create flat table from nested json object in AWS Athena? Logstash AWS Kinesis JSON输入和输出到AWS S3 - Logstash AWS Kinesis JSON Input and output to AWS S3 通过存储在 S3 中的 AWS Athena 读取相同格式的 JSON 和 CSV - Reading JSON & CSV of same format through AWS Athena Stored in S3 量角器-黄瓜报告:result.json为空 - Protractor-cucumber report: result.json is empty 使用 Python 将标准 JSON 文件转换为 json-serde 格式并上传到 Amazon Athena(Presto、Hive)的 AWS S3 存储桶 - Convert standard JSON file to json-serde format using Python & upload to AWS S3 bucket for Amazon Athena (Presto, Hive)
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM