简体繁体 English

是否可以使用 HADOOP YARN 运行任何应用程序或程序？

[英]Is it possible to run ANY application or program with HADOOP YARN?

原文 2020-02-05 08:02:07 3 2 apache-spark/ hadoop/ yarn

I'm studying distributed computing recently and found out Hadoop Yarn is one of them.我最近在研究分布式计算，发现 Hadoop Yarn 就是其中之一。 So thought if I just establish Hadoop Yarn cluster, then every application will run distributed.所以想如果我只是建立Hadoop Yarn集群，那么每个应用程序都会分布式运行。

But now someone told me that HADOOP Yarn cannot do anything by itself and need other things like mapreduce, spark, and hbase.但是现在有人告诉我 HADOOP Yarn 本身不能做任何事情，需要其他东西，比如 mapreduce、spark 和 hbase。

If this is correct, then is that mean only limited tasks can be run with Yarn?如果这是正确的，那么这是否意味着 Yarn 只能运行有限的任务？ Or can I apply Yarn's distributed computing to all applications I want?或者我可以将 Yarn 的分布式计算应用于我想要的所有应用程序吗？

2 个解决方案

Hadoop is the name which refers to the entire system. Hadoop是指整个系统的名称。

HDFS is the actual storage system. HDFS是实际的存储系统。 Think of it as S3 or a distributed Linux filesystem.将其视为 S3 或分布式 Linux 文件系统。

YARN is a framework for scheduling jobs and allocating resources. YARN是一个用于调度作业和分配资源的框架。 It handles these things for you, but you don't interact very much with it.它为您处理这些事情，但您与它的互动并不多。

Spark and MapReduce are managed by Yarn. Spark和MapReduce由 Yarn 管理。 With these two, you can actually write your code/applications and give work to the cluster.有了这两个，您实际上可以编写代码/应用程序并将工作交给集群。

HBase uses the HDFS storage (with is file based) and provides NoSql storage. HBase使用 HDFS 存储（基于文件）并提供 NoSql 存储。

Theoretically you can run more than just Spark and MapReduce on Yarn and you can use something else then Yarn (Kubernetes is in works or supported now).从理论上讲，您不仅可以在 Yarn 上运行 Spark 和 MapReduce，还可以使用 Yarn 之外的其他东西（Kubernetes 正在开发中或现在支持）。 You can even write your own processing tool, queue/resource management system, storage... Hadoop has many pieces which you may use or not, depending on your case.您甚至可以编写自己的处理工具、队列/资源管理系统、存储... Hadoop 有许多部分，您可以使用或不使用，具体取决于您的情况。 But the majority of Hadoop systems use Yarn and Spark.但是大多数 Hadoop 系统使用 Yarn 和 Spark。

If you want to deploy Docker containers for example, just a Kubernetes cluster would be a better choice.例如，如果您想部署 Docker 容器，那么仅使用 Kubernetes 集群将是更好的选择。 If you need batch/real time processing with Spark, use Hadoop.如果您需要使用 Spark 进行批处理/实时处理，请使用 Hadoop。

YARN itself is a resource manager. YARN 本身就是一个资源管理器。 You will need to write code that can be deployed onto those resources, and then that could do anything, given that the nodes running the tasks are themselves capable of running the job.您将需要编写可以部署到这些资源上的代码，然后可以做任何事情，因为运行任务的节点本身能够运行作业。 For example, you cannot distribute a Python library without first installing the dependencies for that script.例如，您不能在没有首先安装该脚本的依赖项的情况下分发 Python 库。 Mesos is a bit more generalized / accessible than YARN, if you want more flexibility for the same affect. Mesos 比 YARN 更通用/更易于访问，如果您希望为相同的效果提供更大的灵活性。

YARN mostly supports running JAR files, shell scripts (at least, from Oozie) or Docker containers can be deployed to it as well (refer Apache docs) YARN 主要支持运行 JAR 文件，shell 脚本（至少来自 Oozie）或 Docker 容器也可以部署到它（请参阅 Apache 文档）

You may also refer to the Apache Slider or Twill projects for more information.您还可以参考 Apache Slider 或 Twill 项目以获取更多信息。