繁体   English   中英

从AVSC创建Hive表,其中包含对先前定义的架构的引用作为一种类型

[英]Create Hive table from AVSC that contains reference to previous defined schema as a type

我正在寻找一种通过Hive获取以下AVSC文件内容并外部化嵌套模式“ RENTALRECORDTYPE”的方法,以实现模式重用。

{
    "type": "record",
    "name": "EMPLOYEE",
    "namespace": "",
    "doc": "EMPLOYEE is a person that works here",
    "fields": [
        {
            "name": "RENTALRECORD",
            "type": {
                "type": "record",
                "name": "RENTALRECORDTYPE",
                "namespace": "",
                "doc": "Rental record is a record that is kept on every item rented",
                "fields": [
                    {
                        "name": "due_date",
                        "doc": "The date when item is due",
                        "type": "int"
                    } 
                ]
            }
        },
        {
            "name": "hire_date",
            "doc": "Employee date of hire",
            "type": "int"
        }
    ]
}

这种定义架构的方法很好用。 我可以发出以下HiveQL语句,并且表已成功创建。

CREATE EXTERNAL TABLE employee
STORED AS AVRO
LOCATION '/user/dtom/store/data/employee'
TBLPROPERTIES ('avro.schema.url'='/user/dtom/store/schema/employee.avsc');

但是,我希望能够引用现有架构,而不是在多个架构中复制记录定义。 例如,将生成两个AVSC文件,而不是单个模式文件。 即rentalrecord.avsc和employee.avsc。

rentalrecord.avsc

{
    "type": "record",
    "name": "RENTALRECORD",
    "namespace": "",
    "doc": "A record that is kept for every rental",
    "fields": [
        {
            "name": "due_date",
            "doc": "The date on which the rental is due back to the store",
            "type": "int"
        }
    ]
}

员工档案

{
    "type": "record",
    "name": "EMPLOYEE",
    "namespace": "",
    "doc": "EMPLOYEE is a person that works for the VIDEO STORE",
    "fields": [
        {
            "name": "rentalrecord",
            "doc": "A rental record is a record on every rental",
            "type": "RENTALRECORD"
        },
        {
            "name": "hire_date",
            "doc": "Employee date of hire",
            "type": "int"
        }
    ]
}

在上述情况下,我们希望能够外部化RENTALRECORD模式定义,并能够在employee.avsc和其他地方重用它。

尝试使用以下两个HiveQL语句导入架构时,它将失败…

CREATE EXTERNAL TABLE rentalrecord
STORED AS AVRO
LOCATION '/user/dtom/store/data/rentalrecord'
TBLPROPERTIES ('avro.schema.url'='/user/dtom/store/schema /rentalrecord.avsc');

CREATE EXTERNAL TABLE employee
STORED AS AVRO
LOCATION '/user/dtom/store/data/employee'
TBLPROPERTIES ('avro.schema.url'='/user/dtom/store/schema/employee.avsc');

rentalrecord.avsc已成功导入,但是employee.avsc在第一个字段定义上失败。 类型为“ RENTALRECORD”的字段。 Hive输出以下错误……

失败:执行错误,从org.apache.hadoop.hive.ql.exec.DDLTask返回代码1。 java.lang.RuntimeException:MetaException(message:org.apache.hadoop.hive.serde2.SerDeException遇到异常确定模式。返回表示问题的信号模式:“ RENTALRECORD”不是已定义的名称。“ rentalrecord”字段的类型必须是定义的名称或{“ type”:...}表达式。)

我的研究告诉我,Avro文件确实支持这种形式的模式重用。 因此,我丢失了某些东西,或者这是Hive不支持的东西。

任何帮助将不胜感激。

我已经定义了带有所有引用的AVDL,然后使用带有idl2schemata选项的avro工具jar文件来生成avsc。 生成的avsc像蜂巢一样吸引人!!

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM