[英]Info on building Spark with “(CDH 4.2.0), yarn (Hadoop 2.4.0)” with Hive?
I'm planning to build Spark to spin off on EC2. 我打算构建Spark以在EC2上分拆。 The default spark_ec2.py downloads a prebuilt package (1 for Hadoop 1.0.4 and 2 for CDH 4.2.0, yarn (Hadoop 2.4.0)) but it is built without '-Phive -Phive-thriftserver' options. 默认的spark_ec2.py下载了一个预先构建的软件包(对于Hadoop 1.0.4,为1;对于CDH 4.2.0,纱线(Hadoop 2.4.0),为2),但该软件包不带'-Phive -Phive-thriftserver'选项。 Mostly I need to use Hive UDFs and it has to be built from source. 通常,我需要使用Hive UDF,并且必须从源代码构建它。 (I'd need YARN too as 'Hive on Spark supports Spark on YARN mode as default.') (我也需要YARN,因为“ Spark上的Hive默认支持YARN模式下的Spark。”)
The 'Building Spark' page illustrates a number of examples and it seems to be a mix of “ Building Spark”页面显示了许多示例,似乎混合了以下内容
mvn -Dhadoop.version=2.0.0-mr1-cdh4.2.0 -Phadoop-1 -DskipTests clean package
and 和
mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -Phive -Phive-thriftserver -DskipTests clean package
(source: http://spark.apache.org/docs/latest/building-spark.html ) (来源: http : //spark.apache.org/docs/latest/building-spark.html )
At the moment, the following is what I can think of 目前,以下是我能想到的
mvn -Pyarn -Dhadoop.version=2.4.0-mr1-cdh4.2.0 -Phadoop-1 -Phive -Phive-thriftserver -DskipTests clean package
Can anyone inform if the above is correct or let me know any other resource that I can learn from it? 谁能告知以上内容是否正确,或者让我知道可以从中学习的其他资源?
Thank you. 谢谢。
I was misunderstood that --hadoop-major-version
has 3 options: 我误解了--hadoop-major-version
有3个选择:
I used spark.ami.hvm.v14 (ami-35b1885c)
and was able to build successfully by the following. 我使用了spark.ami.hvm.v14 (ami-35b1885c)
,并能够通过以下方式成功构建。
./make-distribution.sh --name spark-1.6.0-bin-hadoop2.4-hive-yarn --tgz -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -Psparkr -Phive -Phive-thriftserver -DskipTests
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.