简体繁体中英

What are the differences between pre-built and user-provided hadoop on spark download page?

原文 2019-11-26 03:04:06 4 2 java/ scala/ apache-spark/ hadoop

These questions have been puzzeling me for a long time：

There are five package types in the second selector when the first one is choosing version 2.4.4 .And I am confued about 3 of them: Pre-built for Apache Hadoop 2.7 , Pre-built with user-provided Apache Hadoop , Pre-built with scala 2.12 and user-provided Apache Hadoop .Let me list my questions one by one.

What are the difference between Pre-built for Apache Hadoop 2.7 and Pre-built with user-provided Apache Hadoop ? Does this mean there are two different situation,I already have a hadoop cluster,and I don't have a hadoop cluster. If the former, I should choose Pre-built with user-provided Apache Hadoop ,and if the latter,this package will install a hadoop cluster for me ?
What are the difference between Pre-built with user-provided Apache Hadoop and Pre-built with scala 2.12 and user-provided Apache Hadoop ? As far as I know, spark have scala already installed when I run spark-shell following the tutorail,whose package seems not to be Pre-built with scala 2.12 and user-provided Apache Hadoop ,but just Pre-built with user-provided Apache Hadoop .(Am I right?) Because I think the command line shows someting using scala:

scala> val a = 1;

So why there is still another package emphasize that it is pre-built with scala 2.12?

2 answers

No option will install Hadoop for you. In all cases, Hadoop must pre-exist or its bundled in the Spark download for you and you must first create a HDFS and YARN environment for Spark to run against if you want to run it that way

You can choose the user provided Hadoop if you already have a running cluster and want to add or upgrade Spark, or you're using Spark Standalone, on Mesos, or on Kubernetes instead, in which case Hadoop scripts are not included in the download, although Spark still relies on core Hadoop libraries internally to function

Spark also does not install Scala (or Java) for you. It's simply compiled against Scala 2.12 so trying to run against any other Scala version will result in classpath issues

Summary,

we will need to install Hadoop separately in all three cases (1., 2., and 3.) if we want to support HDFS and YARN
if we don't want to install Hadoop, we can use pre-built Spark with hadoop and run Spark in Standalone mode
if we want to use any version of Hadoop with Spark, then 3. should be used with a separate installation of Hadoop

For Spark 3.1.1 the following package types exist for download:

Pre-built for Apache Hadoop 2.7

This version of spark runs with Hadoop 2.7

Pre-built for Apache Hadoop 3.2 and later

This version of spark runs with Hadoop 3.2 and later

Pre-built with user-provided Apache Hadoop

This version of spark runs with any user-provided version of Hadoop.

From the name of last version (spark-3.1.1-bin-without-hadoop.tgz), it appears that we will need HADOOP for this spark version (ie, 3.) and not the other versions (ie, 1. and 2.). However, the naming is ambiguous. We will need Hadoop only if we want to support HDFS and YARN. In the Standalone mode, Spark can run in a truly distributed setting (or with daemons on a single machine) without Hadoop.

For 1. and 2., you can run Spark without a Hadoop installation as some of the core Hadoop libraries come bundled with the spark prebuilt binary, hence spark-shell would work without throwing any exceptions); for 3., spark will not work unless a HADOOP installation is provided (as 3. comes without the Hadoop runtime).

For more information, refer this from the docs

There are two variants of Spark binary distributions you can download. One is pre-built with a certain version of Apache Hadoop; this Spark distribution contains built-in Hadoop runtime, so we call it with-hadoop Spark distribution. The other one is pre-built with user-provided Hadoop; since this Spark distribution doesn't contain a built-in Hadoop runtime, it's smaller, but users have to provide a Hadoop installation separately. We call this variant no-hadoop Spark distribution. For with-hadoop Spark distribution, since it contains a built-in Hadoop runtime already, by default, when a job is submitted to Hadoop Yarn cluster, to prevent jar conflict, it will not populate Yarn's classpath into Spark ...

Hope this clears a bit of the confusion!

Upgrading Android App with pre-built database

What is the source code of a typical pre-built Nagios Plugin like (check_load)?

Compression algorithm operating on pre-built dictionary as data structure

Github Service Hooks: auto config pre-built integrations information

Is there a repository of pre-built JUnit tests for common Java Interfaces?

including pre-built java classes into an android project

Add back arrow to pre-built FirebaseUI Sign-in?

Java JDBC Login with user-provided username/password

Java BigDecimal - rounding down to the user-provided multiple

Accessing CloudFoundry user-provided services using Spring Cloud connectors

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Upgrading Android App with pre-built database What is the source code of a typical pre-built Nagios Plugin like (check_load)? Compression algorithm operating on pre-built dictionary as data structure Github Service Hooks: auto config pre-built integrations information Is there a repository of pre-built JUnit tests for common Java Interfaces? including pre-built java classes into an android project Add back arrow to pre-built FirebaseUI Sign-in? Java JDBC Login with user-provided username/password Java BigDecimal - rounding down to the user-provided multiple Accessing CloudFoundry user-provided services using Spring Cloud connectors

Related Tags

What are the differences between pre-built and user-provided hadoop on spark download page?

Question

2 answers

solution1
1 2019-11-28 05:24:04

solution2
1 2021-03-25 17:38:37

What are the differences between pre-built and user-provided hadoop on spark download page?

Question

2 answers

solution1 1 2019-11-28 05:24:04

solution2 1 2021-03-25 17:38:37

solution1
1 2019-11-28 05:24:04

solution2
1 2021-03-25 17:38:37