简体   繁体   中英

Role of Zookeeper in Hadoop

I understand based on the slides that in the context of Hadoop that Zookeeper is used for storing information of Master, and status of different tasks, which worker is working on which partition AND also the available workers are also stored in Zookeeper.

Why is Zookeeper is used for this metadata storage here? Any data store can be used right?

For instance Celery can configure any result backend Redis/Mongo etc. So in practice Hadoop can use any storage backend right? But why Zookeeper?

This doc suggests that Redis, SQLite, MySQL, PostgreSQL can be used for celery task result storage.

https://docs.celeryq.dev/en/stable/getting-started/backends-and-brokers/index.html

Zookeeper ZAB protocol is utilized for leader election, as well as distributed locks.

It is not simply a datastore, and no, not any can be used.

Celery isn't used within the Hadoop ecosystem, so I'm not sure how that's relevant to the question.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM