I need help with setting a specific hadoop version in my spark config. I read somewhere that you can use the hadoop.version property. It doesn't say where to find it.
http://spark.apache.org/docs/latest/building-spark.html#specifying-the-hadoop-version
I need to set it from current/default to 2.8.0. Im coding in PyCharm. Please help, preferebly with a step-by-step guide.
Thanks!
You can do it while compiling. Please refer building spark doc .
To build with Hadoop 2.8 run
./build/mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.0 -DskipTests clean package
version 2.7 is for Hadoop 2.7.X and later
You can build like that, for Apache Hadoop 2.7.X and later , so the above answer is correct. [
./build/mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.0 -DskipTests clean package
]
Or you could modify this in the pom.xml of your spark downloaded distribution before performing the maven build, so that the building gets done with the version you want.
<profile>
<id>hadoop2.8</id>
<properties>
<hadoop.version>2.8</hadoop.version>
...
</properties>
</profile>
Take a look at this post for a step-by-step guidance.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.