简体   繁体   English

有关使用Hive使用“(CDH 4.2.0),yarn(Hadoop 2.4.0)”构建Spark的信息?

[英]Info on building Spark with “(CDH 4.2.0), yarn (Hadoop 2.4.0)” with Hive?

I'm planning to build Spark to spin off on EC2. 我打算构建Spark以在EC2上分拆。 The default spark_ec2.py downloads a prebuilt package (1 for Hadoop 1.0.4 and 2 for CDH 4.2.0, yarn (Hadoop 2.4.0)) but it is built without '-Phive -Phive-thriftserver' options. 默认的spark_ec2.py下载了一个预先构建的软件包(对于Hadoop 1.0.4,为1;对于CDH 4.2.0,纱线(Hadoop 2.4.0),为2),但该软件包不带'-Phive -Phive-thriftserver'选项。 Mostly I need to use Hive UDFs and it has to be built from source. 通常,我需要使用Hive UDF,并且必须从源代码构建它。 (I'd need YARN too as 'Hive on Spark supports Spark on YARN mode as default.') (我也需要YARN,因为“ Spark上的Hive默认支持YARN模式下的Spark。”)

The 'Building Spark' page illustrates a number of examples and it seems to be a mix of “ Building Spark”页面显示了许多示例,似乎混合了以下内容

Cloudera CDH 4.2.0 with MapReduce v1 带有MapReduce v1的Cloudera CDH 4.2.0

mvn -Dhadoop.version=2.0.0-mr1-cdh4.2.0 -Phadoop-1 -DskipTests clean package

and

Apache Hadoop 2.4.X with Hive 13 support 具有Hive 13支持的Apache Hadoop 2.4.X

mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -Phive -Phive-thriftserver -DskipTests clean package

(source: http://spark.apache.org/docs/latest/building-spark.html ) (来源: http : //spark.apache.org/docs/latest/building-spark.html

At the moment, the following is what I can think of 目前,以下是我能想到的

mvn -Pyarn -Dhadoop.version=2.4.0-mr1-cdh4.2.0 -Phadoop-1 -Phive -Phive-thriftserver -DskipTests clean package

Can anyone inform if the above is correct or let me know any other resource that I can learn from it? 谁能告知以上内容是否正确,或者让我知道可以从中学习的其他资源?

Thank you. 谢谢。

I was misunderstood that --hadoop-major-version has 3 options: 我误解了--hadoop-major-version有3个选择:

  • "1" for Hadoop 1.0.4 Hadoop 1.0.4为“ 1”
  • "2" for CDH 4.2.0 (mr1) CDH 4.2.0(mr1)的“ 2”
  • "yarn" for Hadoop 2.4.0 Hadoop 2.4.0的“纱线”

I used spark.ami.hvm.v14 (ami-35b1885c) and was able to build successfully by the following. 我使用了spark.ami.hvm.v14 (ami-35b1885c) ,并能够通过以下方式成功构建。

./make-distribution.sh --name spark-1.6.0-bin-hadoop2.4-hive-yarn --tgz -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -Psparkr -Phive -Phive-thriftserver -DskipTests

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM