Initialising HiveContext in Spark CLI

Question

When Initialising Spark in Command-line interface by default SparkContext is initialised as sc and SQLContext as sqlContext.

But I need HiveContext as I am using a function collect_list which is not supported by SparkContext, but is supported by HiveContext. Since HiveContext is a superclass of SparkContext ,it should have worked,but it isn't.

HOW DO I INITIALISE HiveContext in Scala using Spark CLI ?

Answer 1

In spark-shell, sqlContext is an instance of HiveContext by default. You can read about that in my previous answer here .

Nevertheless, collect_list isn't available in spark 1.5.2. It was introduced in spark 1.6 so it's normal that you can find it.

Reference : https://github.com/apache/spark/blob/v1.6.2/sql/core/src/main/scala/org/apache/spark/sql/functions.scala#L213

Also you don't need to import org.apache.spark.sql.functions._ in the shell. It's imported by default.

Answer 2

The sqlContext is a HiveContext

scala> sqlContext
res11: org.apache.spark.sql.SQLContext = org.apache.spark.sql.hive.HiveContext@4756c8f3

[Edit]

Import the functions before use it.

import org.apache.spark.sql.functions._

Answer 3

You can do so by following the below steps:

import org.apache.spark.sql.hive.HiveContext
val sqlContext = new HiveContext(sc)
val depts = sqlContext.sql("select * from departments")`

Initialising HiveContext in Spark CLI

Question

3 answers

solution1
2 ACCPTED 2016-07-04 07:33:10

solution2
1 2016-07-04 06:52:22

solution3
1 2018-03-13 09:19:48

Initialising HiveContext in Spark CLI

Question

3 answers

solution1 2 ACCPTED 2016-07-04 07:33:10

solution2 1 2016-07-04 06:52:22

solution3 1 2018-03-13 09:19:48

solution1
2 ACCPTED 2016-07-04 07:33:10

solution2
1 2016-07-04 06:52:22

solution3
1 2018-03-13 09:19:48