简体   繁体   English

如何为Spark,Python设置特定的Hadoop版本

[英]How to set specific Hadoop version for Spark, Python

I need help with setting a specific hadoop version in my spark config. 我需要帮助在我的spark配置中设置一个特定的hadoop版本。 I read somewhere that you can use the hadoop.version property. 我在某处读到你可以使用hadoop.version属性。 It doesn't say where to find it. 它没有说在哪里找到它。

http://spark.apache.org/docs/latest/building-spark.html#specifying-the-hadoop-version http://spark.apache.org/docs/latest/building-spark.html#specifying-the-hadoop-version

I need to set it from current/default to 2.8.0. 我需要将它从current / default设置为2.8.0。 Im coding in PyCharm. 我在PyCharm编码。 Please help, preferebly with a step-by-step guide. 请提供帮助,最好是逐步指导。

Thanks! 谢谢!

You can do it while compiling. 您可以在编译时执行此操作。 Please refer building spark doc . 请参考建筑火花doc

To build with Hadoop 2.8 run 用Hadoop 2.8运行构建

./build/mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.0 -DskipTests clean package

version 2.7 is for Hadoop 2.7.X and later 版本2.7适用于Hadoop 2.7.X及更高版本

You can build like that, for Apache Hadoop 2.7.X and later , so the above answer is correct. 对于Apache Hadoop 2.7.X 及更高版本 ,您可以像这样构建,所以上面的答案是正确的。 [ [

 ./build/mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.0 -DskipTests clean package

] ]

Or you could modify this in the pom.xml of your spark downloaded distribution before performing the maven build, so that the building gets done with the version you want. 或者您可以在执行maven构建之前在spark下载的发行版的pom.xml中修改它,以便使用您想要的版本完成构建。

<profile>
    <id>hadoop2.8</id>
    <properties>
        <hadoop.version>2.8</hadoop.version>
    ...
    </properties>
</profile>

Take a look at this post for a step-by-step guidance. 看一下这篇文章的分步指导。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM