简体   繁体   中英

Create hive table from table schema stored in .avsc file

I have a hive table schema stored in one hdfs file schema.avsc. I want to create a hive table of the same schema and want to dump a data from another hdfs path where data is stored in HDFS file system.

1 : How can i create a table ? 2 : How can i dump a data stored in hdfs file into created table ?

How can i create a table ?

The Apache Hive documentation on the AvroSerDe shows the syntax for creating a table based on an Avro schema stored in a file. For convenience, I'll repeat one of the examples here:

CREATE TABLE kst
  PARTITIONED BY (ds string)
  ROW FORMAT SERDE
  'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
  STORED AS INPUTFORMAT
  'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
  OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
  TBLPROPERTIES (
    'avro.schema.url'='http://schema_provider/kst.avsc');

This example pulls the schema file from a web server. The documentation also shows other options, such as pulling from a local file, depending on your specific needs.

I recommend reading the entire AvroSerDe documentation page. There is a lot of useful information there about getting the most out of using Hive with Avro.

How can i dump a data stored in hdfs file into created table ?

You can define an external table that references the existing HDFS files. The documentation page for External Tables shows the syntax. Repeating an example:

CREATE EXTERNAL TABLE page_view(viewTime INT, userid BIGINT,
     page_url STRING, referrer_url STRING,
     ip STRING COMMENT 'IP Address of the User',
     country STRING COMMENT 'country of origination')
 COMMENT 'This is the staging page view table'
 ROW FORMAT DELIMITED FIELDS TERMINATED BY '\054'
 STORED AS TEXTFILE
 LOCATION '<hdfs_location>';

After defining the external table, you can then use an INSERT-SELECT query that reads from the external table and writes to the Avro table. The documentation on Inserting data into Hive Tables from queries describes the INSERT-SELECT syntax. For example:

FROM page_view_stg pvs
INSERT OVERWRITE TABLE page_view PARTITION(dt='2008-06-08', country)
       SELECT pvs.viewTime, pvs.userid, pvs.page_url, pvs.referrer_url, null, null, pvs.ip, pvs.cnt

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM