How to set specific Hadoop version for Spark, Python

Question

I need help with setting a specific hadoop version in my spark config. I read somewhere that you can use the hadoop.version property. It doesn't say where to find it.

http://spark.apache.org/docs/latest/building-spark.html#specifying-the-hadoop-version

I need to set it from current/default to 2.8.0. Im coding in PyCharm. Please help, preferebly with a step-by-step guide.

Thanks!

Answer 1

You can do it while compiling. Please refer building spark doc .

To build with Hadoop 2.8 run

./build/mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.0 -DskipTests clean package

version 2.7 is for Hadoop 2.7.X and later

Answer 2

You can build like that, for Apache Hadoop 2.7.X and later , so the above answer is correct. [

 ./build/mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.0 -DskipTests clean package

]

Or you could modify this in the pom.xml of your spark downloaded distribution before performing the maven build, so that the building gets done with the version you want.

<profile>
    <id>hadoop2.8</id>
    <properties>
        <hadoop.version>2.8</hadoop.version>
    ...
    </properties>
</profile>

Take a look at this post for a step-by-step guidance.

How to set specific Hadoop version for Spark, Python

Question

2 answers

solution1
0 2017-05-22 11:38:36

solution2
0 ACCPTED 2017-05-22 13:44:26

How to set specific Hadoop version for Spark, Python

Question

2 answers

solution1 0 2017-05-22 11:38:36

solution2 0 ACCPTED 2017-05-22 13:44:26

solution1
0 2017-05-22 11:38:36

solution2
0 ACCPTED 2017-05-22 13:44:26