简体   繁体   English

aws athena-通过json对象数组创建表

[英]aws athena - Create table by an array of json object

Can I get help in creating a table on AWS Athena. 我可以在AWS Athena上创建表方面获得帮助吗? For a sample example of data : 对于数据示例:

[{"lts": 150}]

AWS Glue generate the schema as : AWS Glue生成的架构如下:

 array (array<struct<lts:int>>)

When I try to use the created table by AWS Glue to preview the table, I had this error: 当我尝试使用AWS Glue创建的表预览表时,出现以下错误:

HIVE_BAD_DATA: Error parsing field value for field 0: org.openx.data.jsonserde.json.JSONObject cannot be cast to org.openx.data.jsonserde.json.JSONArray

The message error is clear, but I can't find the source of the problem! 消息错误很明显,但是我找不到问题的根源!

Hive running under AWS Athena is using Hive-JSON-Serde to serialize/deserialize JSON. 在AWS Athena下运行的Hive正在使用Hive-JSON-Serde来序列化/反序列化JSON。 For some reason, they don't support just any standard JSON. 由于某些原因,它们不只支持任何标准JSON。 They ask for one record per line, without an array. 他们要求每行一条记录,没有数组。 In their words: 用他们的话说:

The following example will work. 以下示例将起作用。

{ "key" : 10 }
{ "key" : 20 }

But this won't: 但这不会:

{
  "key" : 20,
}

Nor this: 也没有:

[{"key" : 20}]

You should create a JSON classifier to convert array into list of object instead of a single array object. 您应该创建一个JSON分类器,将数组转换为对象列表,而不是单个数组对象。 Use JSON path $[*] in your classifier and then set up crawler to use it: 在分类器中使用JSON路径$[*] ,然后设置搜寻器以使用它:

  • Edit crawler 编辑搜寻器
  • Expand 'Description and classifiers' 展开“描述和分类”
  • Click 'Add' on the left pane to associate you classifier with crawler 点击左侧窗格中的“添加”,将您的分类器与搜寻器相关联

After that remove previously created table and re-run the crawler. 之后,删除先前创建的表并重新运行搜寻器。 It will create a table with proper scheme but I think Athena will still be complaining when you will try to query it. 它将创建一个具有适当方案的表,但是我认为当您尝试查询该表时,Athena仍会抱怨。 However, now you can read from that table using Glue ETL job and process single record object instead of array-objects 但是,现在您可以使用Glue ETL作业从该表中读取并处理单个记录对象而不是数组对象

This json - [{"lts": 150}] would work like a charm with below query:- 这个json- [{"lts": 150}]可以像下面的查询一样使用:-

select n.lts from table_name
cross join UNNEST(table_name.array) as t (n) 

The output would be as below:- 输出如下: 在此处输入图片说明

But I have faced a challenge with json like - [{"lts": 150},{"lts": 250},{"lts": 350}] . 但是我遇到了json的挑战-- [{"lts": 150},{"lts": 250},{"lts": 350}] Even if there are 3 elements in the JSON, the query is returning only the first element. 即使JSON中包含3个元素,查询也只返回第一个元素。 This may be because of the limitation listed by @artikas. 这可能是由于@artikas列出的限制。 Definitely, we can change the json like below to make it work:- 当然,我们可以像下面那样更改json以使其工作:-

{"lts": 150}
{"lts": 250}
{"lts": 350}

Please post if anyone is having a better solution to it. 如果有人有更好的解决方案,请发表。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM