简体   繁体   中英

spark 0.9.1 on hadoop 2.2.0 maven dependency

I set up Apache Spark maven dependency in pom.xml as follows

    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-core_2.10</artifactId>
        <version>0.9.1</version>
    </dependency>

But I found that this dependency use " hadoop-client-1.0.4.jar " and " hadoop-core-1.0.4.jar ", and when I run my program I got the error " org.apache.hadoop.ipc.RemoteException: Server IPC version 9 cannot communicate with client version 4 ", which shows that I need to switch hadoop version from 1.0.4 to 2.2.0.

Updates :

Is the following solution a correct method to solve this problem?

    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-core_2.10</artifactId>
        <version>0.9.1</version>
        <exclusions>
            <exclusion> 
                <groupId>org.apache.hadoop</groupId>
                <artifactId>hadoop-core</artifactId>
            </exclusion>
            <exclusion> 
                <groupId>org.apache.hadoop</groupId>
                <artifactId>hadoop-client</artifactId>
            </exclusion>
        </exclusions> 
    </dependency> 
    <dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-client</artifactId>
        <version>2.2.0</version>
    </dependency> 

Many thanks for your help.

Recompile Spark for your Hadoop version, see "A Note About Hadoop Versions" here: http://spark.apache.org/docs/0.9.1/ . They conveniently give an example for 2.2.0

SPARK_HADOOP_VERSION=2.2.0 sbt/sbt assembly

This will create a new jar, $SPARK_HOME/assembly/target/scala-2.10/spark-assembly-*jar that you need to include into your pom.xml (instead of excluding Hadoop from the online jar).

If you're already hosting your own repository (eg on Nexus) then upload it there (this is what I do and it works great). If for some reason you can't upload to any repository, use Maven's install:install-file or one of the answers here Maven: add a dependency to a jar by relative path

Spark 1.2.0 depends on hadoop 2.2.0 be default. If you can update your spark dependency to 1.2.0 (or newer) that will solve the problem.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM