简体   繁体   中英

sqoop Import Failing for Bucketed Hive ORC tables

I have created ORC Bucketed table in Hive by using below DDL :

create table Employee( EmpID STRING , EmpName STRING) 
clustered by (EmpID) into 10 buckets 
stored as orc 
TBLPROPERTIES('transactional'='true');

Then ran Sqoop Import:

sqoop import --verbose \
--connect 'RDBMS_JDBC_URL' \
--driver JDBC_DRIVER \
--table Employee  \
--null-string '\\N' \
--null-non-string '\\N' \
--username USER \
--password PASSWPRD \
--hcatalog-database hive_test_trans \
--hcatalog-table Employee  \
--hcatalog-storage-stanza \
"storedas orc" -m 1

Which failed with the following exception:

 22/12/17 03:28:59 ERROR
 tool.ImportTool: Encountered IOException running import job:
 org.apache.hive.hcatalog.common.HCatException : 2016 : **Error
 operation not supported : Store into a partition with bucket
 definition from Pig/Mapreduce is not supported**
          at org.apache.hive.hcatalog.mapreduce.HCatOutputFormat.setOutput(HCatOutputFormat.java:109)
          at org.apache.hive.hcatalog.mapreduce.HCatOutputFormat.setOutput(HCatOutputFormat.java:70)
          at org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities.configureHCat(SqoopHCatUtilities.java:339)
          at org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities.configureImportOutputFormat(SqoopHCatUtilities.java:753)
          at org.apache.sqoop.mapreduce.ImportJobBase.configureOutputFormat(ImportJobBase.java:98)
          at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:240)
          at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:665)
          at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:497)
          at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:601)

We can solve this problem by creating temporary tables but I do not want add one more step.

Can I directly import the data from Oracle to ORC Bucketed table without using temporary tables ?

Importing data to transactional Hive tables is still not supported by Hive and you have to have a workaround.

Here is the link for the open JIRA ticket for getting it fixed. Until then, you have to do some intermediate operation to write the data to Hive. The temporary table option that you mentioned in your question is a good option to begin with.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM