简体   繁体   English

Hive JSON Serde-未填充空格的键

[英]Hive json serde - Keys with white spaces not populating

I have an external table that is built off of a json file. 我有一个基于json文件构建的外部表。 All of the json keys are columns and are populated as expected except for one key that has a space. 所有json键都是列,并按预期填充,除了一个带空格的键。

Here is the DDL: 这是DDL:

CREATE EXTERNAL TABLE foo.bar ( event ARRAY <STRUCT value:STRING ,info:STRUCT <id:STRING ,event_source:STRING>> ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' WITH SERDEPROPERTIES("mapping.event_source"="event source") STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION 'foo/bar'

All of the values show up as expected except for event_source, which shows up as NULL. 除event_source(显示为NULL)外,所有值均按预期显示。 The original form of event_source in the json file is 'event source' without the single quotes. json文件中event_source的原始形式为'event source',不带单引号。 Is there something I need to do different with the WITH SERDEPROPERTIES setting in order to get the key to work properly? 为了使密钥正常工作,我需要对WITH SERDEPROPERTIES设置做些不同的事情吗?

Thanks 谢谢

you mean that the json has data like 你的意思是json的数据像

{ id: "myid", event source: "eventsource" } 

If so, there's not much that can be done since it's simply broken JSON. 如果是这样,那么就没有什么可以做的了,因为它只是破损的JSON。 If not, can you post a sample of the JSON you're trying to read ? 如果不是,是否可以发布您尝试读取的JSON示例?

I have encountered a similar problem as above but with a slight variation that the input data is correct json. 我遇到了与上述类似的问题,但略有不同,即输入数据是正确的json。

I have an external table that is built off of a json file. 我有一个基于json文件构建的外部表。 All of the json keys are populated except one msrp_currency Here is the DDL: 除了一个msrp_currency以外,所有的json密钥均已填充。这是DDL:

CREATE EXTERNAL TABLE foo.bar
( id string,
variants array<struct<pid:string, msrp_currency:string>>
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES ( "ignore.malformed.json" = "true" ,
'mapping.variants.msrp_currency' = 'variants.msrpcurrency')
LOCATION 'foo/bar'

All of the values show up as expected except for msrp_currency, which shows up as NULL. 除msrp_currency(显示为NULL)外,所有值均按预期显示。 The reason I need to introduce underscore is because later I need to extract the same field value as msrpCurrecny using brickhouse to_json UDF. 我需要引入下划线的原因是,稍后我需要使用Brickhouse to_json UDF提取与msrpCurrecny相同的字段值。

sample values: 样本值:

{ "pid": "mypid", "msrpCurrency": "USD" } 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM