spark 0.9.1 on hadoop 2.2.0 maven dependency

Question

I set up Apache Spark maven dependency in pom.xml as follows

    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-core_2.10</artifactId>
        <version>0.9.1</version>
    </dependency>

But I found that this dependency use " hadoop-client-1.0.4.jar " and " hadoop-core-1.0.4.jar ", and when I run my program I got the error " org.apache.hadoop.ipc.RemoteException: Server IPC version 9 cannot communicate with client version 4 ", which shows that I need to switch hadoop version from 1.0.4 to 2.2.0.

Updates :

Is the following solution a correct method to solve this problem?

    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-core_2.10</artifactId>
        <version>0.9.1</version>
        <exclusions>
            <exclusion> 
                <groupId>org.apache.hadoop</groupId>
                <artifactId>hadoop-core</artifactId>
            </exclusion>
            <exclusion> 
                <groupId>org.apache.hadoop</groupId>
                <artifactId>hadoop-client</artifactId>
            </exclusion>
        </exclusions> 
    </dependency> 
    <dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-client</artifactId>
        <version>2.2.0</version>
    </dependency>

Many thanks for your help.

Answer 1

Recompile Spark for your Hadoop version, see "A Note About Hadoop Versions" here: http://spark.apache.org/docs/0.9.1/ . They conveniently give an example for 2.2.0

SPARK_HADOOP_VERSION=2.2.0 sbt/sbt assembly

This will create a new jar, $SPARK_HOME/assembly/target/scala-2.10/spark-assembly-*jar that you need to include into your pom.xml (instead of excluding Hadoop from the online jar).

If you're already hosting your own repository (eg on Nexus) then upload it there (this is what I do and it works great). If for some reason you can't upload to any repository, use Maven's install:install-file or one of the answers here Maven: add a dependency to a jar by relative path

Answer 2

Spark 1.2.0 depends on hadoop 2.2.0 be default. If you can update your spark dependency to 1.2.0 (or newer) that will solve the problem.

spark 0.9.1 on hadoop 2.2.0 maven dependency

Question

2 answers

solution1
2 2014-05-31 19:18:23

solution2
1 ACCPTED 2015-01-07 13:00:55

spark 0.9.1 on hadoop 2.2.0 maven dependency

Question

2 answers

solution1 2 2014-05-31 19:18:23

solution2 1 ACCPTED 2015-01-07 13:00:55

solution1
2 2014-05-31 19:18:23

solution2
1 ACCPTED 2015-01-07 13:00:55