简体   繁体   English

如何处理 JSON 在 AWS ATHENA 的列名中有特殊字符

[英]How to deal with JSON with special characters in Column Names in AWS ATHENA

I'm new to athena even though I have some short experience with Hive.尽管我对 Hive 有一些短暂的体验,但我还是雅典娜的新手。

I'm trying to create a table from JSON files, which are exports from MongoDB. My problem is that MongoDB uses $oid, $numberInt, $numberDoble and others as internal references, but ' $ ' is not accepted in a column name in Athena.我正在尝试从 JSON 文件创建一个表,这些文件是从 MongoDB 导出的。我的问题是 MongoDB 使用 $oid、$numberInt、$numberDoble 和其他作为内部引用,但 ' $ ' 在列名中不被接受雅典娜。

This is a one line JSON file that I created to test:这是我创建的用于测试的单行 JSON 文件:

{"_id":{"$oid":"61f87ebdf655d153709c9e19"}}

and this is the table that referes to it:这是引用它的表:

CREATE EXTERNAL TABLE landing.json_table (
`_id` struct<`$oid`:string>
)
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
LOCATION 's3://bucket-name/test/';

When I run a simple SELECT * it returns this error:当我运行一个简单的 SELECT * 它返回这个错误:

HIVE_METASTORE_ERROR: Error: name expected at the position 7 of 'struct<$oid:string>' but '$' is found. HIVE_METASTORE_ERROR:错误:名称预期位于“struct<$oid:string>”的 position 7,但找到了“$”。 (Service: null; Status Code: 0; Error Code: null; Request ID: null; Proxy: null) (服务:null;状态代码:0;错误代码:null;请求 ID:null;代理:null)

Which is related to the fact that the JSON column contains the $.这与 JSON 列包含 $ 的事实有关。

Any idea on how to handle the situation?关于如何处理这种情况的任何想法? My only resolution for now is to create a script which "clean" the json file from the unaccepted characters but I would really prefer to handle it directly in Athena if possible我现在唯一的解决办法是创建一个脚本,从不接受的字符中“清除”json 文件,但如果可能的话,我真的更愿意直接在 Athena 中处理它

If you switch to the OpenX SerDe, you can create a SerDe mapping for JSON fields with special characters like $ in the name.如果切换到 OpenX SerDe,您可以为 JSON 字段创建一个 SerDe 映射,名称中包含特殊字符,如$

See AWS Blog entry Create Tables in Amazon Athena from Nested JSON and Mappings Using JSONSerDe , section "Walkthrough: Handling forbidden characters with mappings".请参阅 AWS 博客条目Create Tables in Amazon Athena from Nested JSON 和 Mappings Using JSONSerDe ,“演练:使用映射处理禁用字符”部分。

A mapping that would work for your example:适用于您的示例的映射:

CREATE EXTERNAL TABLE landing.json_table (
`_id` struct<`oid`:string>
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES (
"mapping.oid"="$oid"
)
LOCATION 's3://bucket-name/test/';

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM