简体   繁体   中英

How to set specific Hadoop version for Spark, Python

I need help with setting a specific hadoop version in my spark config. I read somewhere that you can use the hadoop.version property. It doesn't say where to find it.

http://spark.apache.org/docs/latest/building-spark.html#specifying-the-hadoop-version

I need to set it from current/default to 2.8.0. Im coding in PyCharm. Please help, preferebly with a step-by-step guide.

Thanks!

You can do it while compiling. Please refer building spark doc .

To build with Hadoop 2.8 run

./build/mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.0 -DskipTests clean package

version 2.7 is for Hadoop 2.7.X and later

You can build like that, for Apache Hadoop 2.7.X and later , so the above answer is correct. [

 ./build/mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.0 -DskipTests clean package

]

Or you could modify this in the pom.xml of your spark downloaded distribution before performing the maven build, so that the building gets done with the version you want.

<profile>
    <id>hadoop2.8</id>
    <properties>
        <hadoop.version>2.8</hadoop.version>
    ...
    </properties>
</profile>

Take a look at this post for a step-by-step guidance.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM