简体   繁体   中英

Info on building Spark with “(CDH 4.2.0), yarn (Hadoop 2.4.0)” with Hive?

I'm planning to build Spark to spin off on EC2. The default spark_ec2.py downloads a prebuilt package (1 for Hadoop 1.0.4 and 2 for CDH 4.2.0, yarn (Hadoop 2.4.0)) but it is built without '-Phive -Phive-thriftserver' options. Mostly I need to use Hive UDFs and it has to be built from source. (I'd need YARN too as 'Hive on Spark supports Spark on YARN mode as default.')

The 'Building Spark' page illustrates a number of examples and it seems to be a mix of

Cloudera CDH 4.2.0 with MapReduce v1

mvn -Dhadoop.version=2.0.0-mr1-cdh4.2.0 -Phadoop-1 -DskipTests clean package

and

Apache Hadoop 2.4.X with Hive 13 support

mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -Phive -Phive-thriftserver -DskipTests clean package

(source: http://spark.apache.org/docs/latest/building-spark.html )

At the moment, the following is what I can think of

mvn -Pyarn -Dhadoop.version=2.4.0-mr1-cdh4.2.0 -Phadoop-1 -Phive -Phive-thriftserver -DskipTests clean package

Can anyone inform if the above is correct or let me know any other resource that I can learn from it?

Thank you.

I was misunderstood that --hadoop-major-version has 3 options:

  • "1" for Hadoop 1.0.4
  • "2" for CDH 4.2.0 (mr1)
  • "yarn" for Hadoop 2.4.0

I used spark.ami.hvm.v14 (ami-35b1885c) and was able to build successfully by the following.

./make-distribution.sh --name spark-1.6.0-bin-hadoop2.4-hive-yarn --tgz -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -Psparkr -Phive -Phive-thriftserver -DskipTests

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM