简体   繁体   中英

How many Shark servers are necessary in relation to Spark?

I'm new to Spark/Shark and have spun up a cluster with three Spark workers. I started installing Shark on the same three servers but I'm coming to the conclusion that maybe that's not needed and only one Shark server is necessary -- I can't find anything that speaks to this in the documentation. Do I only need one Shark server since Spark/Hive will be doing the heavily lifting, or do I need to distribute it to all the servers where Spark resides?

Shark is a Spark application. It is just like a WordCount or Spark Shell. You need to have it on a client machine from which you are going to send queries.

If Shark JARS are not present on the worker machines, they have to be attached to the Spark Context.

Shark server works a little bit like a 'screen' in unix systems. In this case, Shark server is an application in Spark. You connect to Shark server with Shark console, send the queries, and the queries are executed by Shark server on Spark on you behalf.

Assuming that by Shark you mean the ThriftServer, then you only need one Shark per (Spark) cluster.

This carries over even to Spark 1.0.1 where Shark is retired because the ThriftServer has been brought into the Spark core itself.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM