简体繁体中英

Can Apache YARN be used without HDFS?

原文 2017-03-02 08:06:28 2 5 apache/ hadoop/ yarn/ hadoop2

I want to use Apache YARN as a cluster and resource manager for running a framework where resources would be shared across different task of the same framework. I want to use my own distributed off-heap file system.

Is it possible to use any other distributed file system with YARN other than HDFS?
If yes, what HDFS APIs need to be implemented?
What Hadoop components are required to run YARN?

5 answers

There's some different questions here

Can you use YARN to deploy apps using something like S3 to propagate the binaries?

Yes: it's how LinkedIn have deployed Samza in the past, using http:// downloads. Samza does not need a cluster filesystem, so there is no hdfs running in cluster, just local file:// filesystems, one per host.

Applications which need a cluster fileystems wouldn't work in such a cluster.

Can you bring up a YARN cluster with an alternative filesystem?

Yes.

For what "filesystem" is, look at the Filesystem Specification . You need a consistent view across the filesytem: newly create files list(), deleted ones aren't found, updates immediately visible. And rename() of files and directories must be an atomic operation, ideally O(1). It's used for atomic commits of work, checkpoints, ... Oh, and for HBase, append() is needed.

MapR does this, Redhat with GlusterFS; IBM and EMC for theirs. Do bear in mind here that pretty much everything is tested on HDFS; you'd better hope the other cluster FS has done the testing (or someone has done it for them, such as Hortonworks or Cloudera).

Can you bring up a YARN cluster using an object store as the underlying FS.

It depends on whether or not the FS offers a consistent filesystem view, rather than some eventual consistency world view. HBase is the real test here.

Microsoft Azure Storage is consistent, has leases for obtaining exclusive access to bits of the FS and rename()s really fast. In Azure it completely replaces HDFS.
Google cloud storage announced on Mar 1 2017 that GCS offers consistency. Maybe it can be used as a replacement now; no experience there.
Amazon EMR does offer s3 as a replacement using (a) dynamo for the consistent metadata and (b) doing horrible things to get HBase to work.
The ASF's own S3 client, S3a, can't be used as a replacement. We in the team working on it have been focusing on read and write perf as a source and final destination of data; in s3guard adding the dynamo layer and in the s3guard committer, on being able to use it as a high performance destination of work (resilient to failures while avoiding rename()).

Can the new distributed Filesystem you are writing be used as a replacement for HDFS?

Well, you can certainly try!

First get all the filesystem contract tests to work, which measure basic API compliance. Then look at all the Apache Bigtop tests, which do system integration. I recommend you avoid HBase & Accumulo initially, focus on: Mapreduce, Hive, spark, Flink.

Don't be afraid to get on the Hadoop common-dev & bigtop lists and ask questions.

Here's the interface you have to implement , keep an eye on the guarantees that you have to support. There's a utility to test the contracts. If you need an example, there are a plethora of implementations of different filsystems within Hadoop, eg for S3/AzureBlobs/FTP that serve as a good starting point.

You can configure your filesystem implementation by class, all components should honor fs.defaultFS as the configuration key .

Yes, you can provided you have a file store implementation that supports HDFS API.

for eg you can use AWS S3 (s3n:// or s3a://) instead of HDFS. there few other file systems that supports HDFS API.

Yarn is not only resource manager for distributed cluster. Apache Mesos is resource manager similar yarn (but internal technology is different.). And it is not dependent to hadoop components. In enterprise cloud infra, already many uses such as dc/os(Consisting of mesos, docker, etc)

YARN can be used without HDFS . You don't have to configure and start HDFS services, so it will run without HDFS.

But you can not install YARN without Hadoop. You have to download the Hadoop and configure only YARN(and other services which you want to use).

Can apache flume hdfs sink accept dynamic path to write?

HDFS Plugin for Apache Ranger

Apache Flume + Hdfs Sink

Apache and Yarn ports

Can apache Shiro be used to build an Identity Provider?

Can Apache Ace be used as Maven repository?

Formatting Apache Flume HDFS Serializer

Apache Drill - Query HDFS and SQL

HADOOP / YARN - Are the ResourceManager and the hdfs NameNode always installed on the same host?

How can I merge files in directory in hdfs without using get merge command?

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Can apache flume hdfs sink accept dynamic path to write? HDFS Plugin for Apache Ranger Apache Flume + Hdfs Sink Apache and Yarn ports Can apache Shiro be used to build an Identity Provider? Can Apache Ace be used as Maven repository? Formatting Apache Flume HDFS Serializer Apache Drill - Query HDFS and SQL HADOOP / YARN - Are the ResourceManager and the hdfs NameNode always installed on the same host? How can I merge files in directory in hdfs without using get merge command?

Related Tags

Can Apache YARN be used without HDFS?

Question

5 answers

solution1
14 ACCPTED 2017-03-02 11:35:14

Can you use YARN to deploy apps using something like S3 to propagate the binaries?

Can you bring up a YARN cluster with an alternative filesystem?

Can you bring up a YARN cluster using an object store as the underlying FS.

Can the new distributed Filesystem you are writing be used as a replacement for HDFS?

solution2
2 2017-03-02 11:04:43

solution3
0 2017-03-02 11:10:10

solution4
-1 2017-03-02 09:54:37

solution5
-1 2017-03-02 11:18:23

Can Apache YARN be used without HDFS?

Question

5 answers

solution1 14 ACCPTED 2017-03-02 11:35:14

Can you use YARN to deploy apps using something like S3 to propagate the binaries?

Can you bring up a YARN cluster with an alternative filesystem?

Can you bring up a YARN cluster using an object store as the underlying FS.

Can the new distributed Filesystem you are writing be used as a replacement for HDFS?

solution2 2 2017-03-02 11:04:43

solution3 0 2017-03-02 11:10:10

solution4 -1 2017-03-02 09:54:37

solution5 -1 2017-03-02 11:18:23

solution1
14 ACCPTED 2017-03-02 11:35:14

solution2
2 2017-03-02 11:04:43

solution3
0 2017-03-02 11:10:10

solution4
-1 2017-03-02 09:54:37

solution5
-1 2017-03-02 11:18:23