How to load xml file into Hive

Question

Im working on Hive tables im having the following problem. I am having more than 1 billion of xml files in my HDFS. What i want to do is, Each xml file having the 4 different sections. Now i want to split and load the each part in the each table for every xml file

Example :

            <?xml version='1.0' encoding='iso-8859-1'?>

            <section1>
                <id> 1233222 </id>
               // having lot of xml tages 
            </section1>

            <section2>
               // having lot of xml tages 
            </section2>

            <section3>
               // having lot of xml tages 
            </section3>

            <section4>
               // having lot of xml tages 
            </section4>

            </xml>

And i have the four tables

        section1Table

        id       section1    // fields 

        section2Table

        id       section2

        section3Table 

        id       section3

        section4Table

        id       section4

Now i want to split and load the data into each table.

How can i achieve this . Can anyone help me

Thanks

UPDATE

I have tried the following

CREATE EXTERNAL TABLE test(name STRING) LOCATION '/user/sornalingam/zipped/output/Tagged/t1';\


SELECT xpath (name, '//section1') FROM test LIMIT 1 ;

but i got the following error

java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"name":"<?xml version='1.0' encoding='iso-8859-1'?>"}

Answer 1

You have several options:

Load the XML into a Hive table with a string column, one per row (eg CREATE TABLE xmlfiles (id int, xmlfile string) . Then use an XPath UDF to do work on the XML.
Since you know the XPath's of what you want (eg //section1 ), follow the instructions in the second half of this tutorial to ingest directly into Hive via XPath.
Map your XML to Avro as described here because a SerDe exists for seamless Avro-to-Hive mapping.
Use XPath to store your data in a regular text file in HDFS and then ingest that into Hive.

It depends on your level of experience and comfort with these approaches.

Answer 2

Use this:

CREATE EXTERNAL TABLE test(name STRING) LOCATION '/user/sornalingam/zipped/output/Tagged/t1'

tblproperties ("skip.header.line.count"="1", "skip.footer.line.count"="1");

And then use xpath function

Answer 3

You can automate the whole process of converting complex XML to Hive, eg Flexter XML converter can generate Parquet or Avro files that can be queried by Hive.

Here is a blog post that shows how to automate the conversion of MISMO XML to Hive and Parquet

Disclaimer: I work for Sonra

How to load xml file into Hive

Question

2 answers

solution1
6 2013-12-31 05:56:40

solution2
0 2015-10-09 15:21:22

solution3
0 2017-06-24 16:39:41

How to load xml file into Hive

Question

2 answers

solution1 6 2013-12-31 05:56:40

solution2 0 2015-10-09 15:21:22

solution3 0 2017-06-24 16:39:41

solution1
6 2013-12-31 05:56:40

solution2
0 2015-10-09 15:21:22

solution3
0 2017-06-24 16:39:41