简体   繁体   中英

Sqoop integration with hadoop for oracle data import with oraoop

I have been trying to import data from oracle Express edition 11g R2 to hadoop using scoop with oraoop.

I installed CDH sqoop and tried to integrate the already running apache hadoop.

I found that oraoop is used correctly but i face the following issue on import. I also tried with apache sqoop with apache hadoop but still faced the following issue. The web search suggested to use CDH hadoop as well instead of apache Hadoop.

**

Exception in thread "main" java.lang.IncompatibleClassChangeError: Found class org.apache.hadoop.mapreduce.JobContext, but interface was expected at com.quest.oraoop.OraOopDataDrivenDBInputFormat.getDesiredNumberOfMappers(OraOopDataDrivenDBInputFormat.java:201) at com.quest.oraoop.OraOopDataDrivenDBInputFormat.getSplits(OraOopDataDrivenDBInputFormat.java:51)

**

To summarize,

CDH sqoop + Apache Hadoop - Data import failed with the above

exception Apache Sqoop + Apache hadoop - Data import failed with the above exception

CDH Sqoop + CDH Hadoop - Is this the right combination?

Any suggestions? I am not sure if am going the right way. Please help.

Hadoop has gone through a huge code refactoring from Hadoop 1.0 to Hadoop 2.0 (correspondingly from CDH3 to CDH4). One side effect is that code compiled against Hadoop 1.0 (CDH3) is not compatible with Hadoop 2.0 (CDH4) and vice-versa. However source code is compatible and thus one just need to recompile code with target Hadoop distribution.

The exception "Found class X, but interface was expected at" is very common when you're running code that is compiled for Hadoop 1.0 (CDH3) on Hadoop 2.0 (CDH4) or vice-versa.

Solution is simple, you need to synchronize the versions. Using CDH3 Hadoop + CDH3 Sqoop or CDH4 Hadoop + CDH4 Sqoop is the simplest way to go. If you prefer to use upstream Sqoop release, then you have to make sure that you're using binary artifact compiled for your Hadoop distribution. Sqoop makes this easy as the target hadoop distribution is encoded in the artifact name - for example sqoop-1.4.2.bin__hadoop-1.0.0.tar.gz is meant to be used on Hadoop 1.0 [1].

Exactly the same constraint applies to connectors. You must download connector for Hadoop version that you're running on. In case of OraOop there are also separate artifacts for CDH3 and CDH4 [2].

Jarcec

Links:

1: http://www.apache.org/dist/sqoop/1.4.2/

2: https://ccp.cloudera.com/display/con/Quest+Data+Connectors

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM