简体   繁体   中英

AWS Athena query on json like data

I have one S3 bucket with some data stored. I want to do query on those data using Athena tables.

Structure of S3 file:

{
OuterKey1:"OuterValue1",
OuterKey2:"{
   InnerKey1:4.0,
   InnerKey2:"someString"
}",
OuterKey3:1625833855741
}

This structure looks like json, but its not exactly json as the key doesn't have quotes

I used Glue-crawler to create table from S3 folder. Glue-crawler recognises this structure as json. But it classifies all the values as string for key having nested structure.

I want to query on nested structure, say InnerKey1 = 5.0 but since this whole structure is string I am unable to query.

Things tried:

  1. I tried using json_extract in the query but since the value is in string it returns empty result.
  2. I tried converting the value as json by using cast(col) as json but it just adds quotes at the start and end of the nested value (ignoring the inner structure)
  3. Manually tried converting the column type as STRUCT instead of String but it is giving error.

Is there any way to query such structure? The S3 file is getting prepared from ddb entries and has been stored as txt file.

More observation:

When i removed the quotes from the inner nested structure in the S3 file and uploaded test bucket, the crawler identified the column as STRUCT instead of string and i was able to query on inner nested structure. But i don't have control over the source so i can't change the structure in the source S3 folder.

Other possible solutions identified is using ETL jobs to parse and clean the data. But it would then require to store those data (which I don't want as it would be redundant data)

Is there any possible solutions which can be achieved through Athena query?

Athena has two JSON serdes but both require the data to be valid JSON. The JSON functions also require the data to be valid JSON. You can use regular expression functions to extract things, but that's probably as good as it will get.

Glue's crawlers tend to misidentify things and don't always create tables that work with Athena, unfortunately. In most cases you're better off setting up tables manually and use partition projection .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM