简体   繁体   中英

How to deal with JSON with special characters in Column Names in AWS ATHENA

I'm new to athena even though I have some short experience with Hive.

I'm trying to create a table from JSON files, which are exports from MongoDB. My problem is that MongoDB uses $oid, $numberInt, $numberDoble and others as internal references, but ' $ ' is not accepted in a column name in Athena.

This is a one line JSON file that I created to test:

{"_id":{"$oid":"61f87ebdf655d153709c9e19"}}

and this is the table that referes to it:

CREATE EXTERNAL TABLE landing.json_table (
`_id` struct<`$oid`:string>
)
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
LOCATION 's3://bucket-name/test/';

When I run a simple SELECT * it returns this error:

HIVE_METASTORE_ERROR: Error: name expected at the position 7 of 'struct<$oid:string>' but '$' is found. (Service: null; Status Code: 0; Error Code: null; Request ID: null; Proxy: null)

Which is related to the fact that the JSON column contains the $.

Any idea on how to handle the situation? My only resolution for now is to create a script which "clean" the json file from the unaccepted characters but I would really prefer to handle it directly in Athena if possible

If you switch to the OpenX SerDe, you can create a SerDe mapping for JSON fields with special characters like $ in the name.

See AWS Blog entry Create Tables in Amazon Athena from Nested JSON and Mappings Using JSONSerDe , section "Walkthrough: Handling forbidden characters with mappings".

A mapping that would work for your example:

CREATE EXTERNAL TABLE landing.json_table (
`_id` struct<`oid`:string>
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES (
"mapping.oid"="$oid"
)
LOCATION 's3://bucket-name/test/';

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM