I am connecting to Hbase using Spark. I have added all the dependencies but still i am getting this exception. Kindly help me like which JAR i need to add to resolve this issue.
SPARK_MAJOR_VERSION is set to 2, using Spark2
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/hdp/2.6.5.0-292/spark2/jars/slf4j-log4j12 -1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.6.5.0-292/spark2/jars/slf4j-log4j12 -1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.6.5.0-292/spark2/jars/phoenix-4.7.0 .2.6.5.0-292-client.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.6.5.0-292/spark2/jars/phoenix-4.7.0 .2.6.5.0-292-thin-client.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLeve l(newLevel).
18/09/17 05:34:36 WARN Utils: Service 'SparkUI' could not bind on port 4040. Att empting port 4041.
Spark context Web UI available at http://sandbox-hdp.hortonworks.com:4041
Spark context available as 'sc' (master = local[*], app id = local-1537162476668).
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.3.0.2.6.5.0-292
/_/
Using Scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_171)
Type in expressions to have them evaluated.
Type :help for more information.
scala> :paste
// Entering paste mode (ctrl-D to finish)
import org.apache.spark.sql.{SQLContext, _}
import org.apache.spark.sql.execution.datasources.hbase._
import org.apache.spark.{SparkConf, SparkContext}
import spark.sqlContext.implicits._
import org.apache.hadoop.hbase.HBaseConfiguration
import org.apache.hadoop.hbase.client.{ConnectionFactory,HBaseAdmin,HTable,Put,Get}
import org.apache.hadoop.hbase.util.Bytes
import org.apache.hadoop.hbase.mapreduce.TableInputFormat
import org.apache.hadoop.hbase.client.HBaseAdmin
import org.apache.hadoop.hbase.{HTableDescriptor,HColumnDescriptor}
def catalog = s"""{
|"table":{"namespace":"default", "name":"Contacts"},
|"rowkey":"key",
|"columns":{
|"rowkey":{"cf":"rowkey", "col":"key", "type":"string"},
|"officeAddress":{"cf":"Office", "col":"Address", "type":"string"},
|"officePhone":{"cf":"Office", "col":"Phone", "type":"string"},
|"personalName":{"cf":"Personal", "col":"Name", "type":"string"},
|"personalPhone":{"cf":"Personal", "col":"Phone", "type":"string"}
|}
|}""".stripMargin
def withCatalog(cat: String): DataFrame = {
spark.sqlContext
.read
.options(Map(HBaseTableCatalog.tableCatalog->cat))
.format("org.apache.spark.sql.execution.datasources.hbase")
.load()
}
val df = withCatalog(catalog)
df.registerTempTable("contacts")
val query = spark.sqlContext.sql("select personalName, officeAddress from contacts")
query.show() <p>
// Exiting paste mode, now interpreting.
warning: there was one deprecation warning; re-run with -deprecation for details
java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/shaded/protobuf/generated/MasterProtos$MasterService$BlockingInterface
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:467)
at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
Below are the Jar's available in spark-Jar's Folder
hbase-0.94.2.jar
hbase-annotations-1.2.0.jar
hbase-client-2.1.0.jar
hbase-common-2.1.0.jar
hbase-hadoop-compat-2.1.0.jar
hbase-hadoop2-compat-2.1.0.jar
hbase-it-1.1.2.2.6.5.0-292.jar
hbase-prefix-tree-1.1.2.2.6.5.0-292.jar
hbase-procedure-1.1.2.2.6.5.0-292.jar
hbase-protocol-2.1.0.jar
hbase-server-2.1.0.jar
hbase-spark-1.2.0-cdh5.8.3.jar
hbase-spark-1.1.2.2.6.5.0-292.jar
hbase-thrift-1.1.2.2.6.5.0-292.jar
hive-hbase-handler-0.12.0-cdh5.1.3.jar
hive-hbase-handler-3.1.0.jar
protobuf-java-3.5.1.jar
Kindly provide me suggestion like which jar i missed to add in the jars folder in order to connect to hbase.
Seems like you are missing a shc-core jar which is used to write dataframes to hbase which has been implented by hortonworks.
As you are importing the package from hortonworks-shc-connector
import org.apache.spark.sql.execution.datasources.hbase._
You need add the jar to your spark application.
Steps to get the jar of shc-core connector:
First get pull of hortonworks-spark/hbase connector github repository then checkout to a appropriate branch with version of hbase and hadoop that you are using in your environment and build it using
mvn clean install -DskipTests
after executing above you will have a jar in your ~/.m2/repository/com/hortonworks/shc/
Use this jar for your spark application.
You can either add to your spark-jar folder or you can pass it in spark-submit/spark-shell with --jars flag
Then use try to execute the code you are trying run.
I have followed the same steps and was able to read from hbase with HCatalog.
Example
spark-shell --jars shc-core-1.1.3-2.4-s_2.11.jar
SPARK_MAJOR_VERSION is set to 2, using Spark2
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use
setLogLevel(newLevel).
Spark context Web UI available at http://sandbox-hdp.hortonworks.com:4040
Spark context available as 'sc' (master = yarn, app id =
application_1592322799672_0007).
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.4.0.7.0.3.0-79
/_/
Using Scala version 2.11.12 (OpenJDK 64-Bit Server VM, Java 1.8.0_232)
Type in expressions to have them evaluated.
Type :help for more information.
scala> :paste
// Entering paste mode (ctrl-D to finish)
import org.apache.spark.sql.{SQLContext, _}
import org.apache.spark.sql.execution.datasources.hbase._
import org.apache.spark.{SparkConf, SparkContext}
import spark.sqlContext.implicits._
import org.apache.hadoop.hbase.HBaseConfiguration
import org.apache.hadoop.hbase.client.{ConnectionFactory,HBaseAdmin,HTable,Put,Get}
import org.apache.hadoop.hbase.util.Bytes
import org.apache.hadoop.hbase.mapreduce.TableInputFormat
import org.apache.hadoop.hbase.client.HBaseAdmin
import org.apache.hadoop.hbase.{HTableDescriptor,HColumnDescriptor}
def catalog = s"""{
|"table":{"namespace":"default", "name":"Contacts"},
|"rowkey":"key",
|"columns":{
|"rowkey":{"cf":"rowkey", "col":"key", "type":"string"},
|"officeAddress":{"cf":"Office", "col":"Address", "type":"string"},
|"officePhone":{"cf":"Office", "col":"Phone", "type":"string"},
|"personalName":{"cf":"Personal", "col":"Name", "type":"string"},
|"personalPhone":{"cf":"Personal", "col":"Phone", "type":"string"}
|}
|}""".stripMargin
def withCatalog(cat: String): DataFrame = {
spark.sqlContext
.read
.options(Map(HBaseTableCatalog.tableCatalog->cat))
.format("org.apache.spark.sql.execution.datasources.hbase")
.load()
}
val df = withCatalog(catalog)
df.registerTempTable("contacts")
val query = spark.sqlContext.sql("select personalName, officeAddress from contacts")
query.show()
// Exiting paste mode, now interpreting.
warning: there was one deprecation warning; re-run with -deprecation for details
Hive Session ID = 5cc02976-98c4-447f-9ba0-e70c4a3c4ab1
+------------+-------------+
|personalName|officeAddress|
+------------+-------------+
|John Jackson| 40 Ellis St.|
|John Jackson| 40 Ellis St.|
+------------+-------------+
import org.apache.spark.sql.{SQLContext, _}
import org.apache.spark.sql.execution.datasources.hbase._
import org.apache.spark.{SparkConf, SparkContext}
import spark.sqlContext.implicits._
import org.apache.hadoop.hbase.HBaseConfiguration
import org.apache.hadoop.hbase.client.{ConnectionFactory, HBaseAdmin, HTable, Put, Get}
import org.apache.hadoop.hbase.util.Bytes
import org.apache.hadoop.hbase.mapreduce.TableInputFormat
import org.apache.hadoop.hbase.client.HBaseAdmin
import org.apache.hadoop.hbase.{HTableDescriptor, HColumnDescriptor}
catalog: String
withCatalog: (cat: String)org.apache.spark.sql.DataFrame
df: org.apache.spark.sql.DataFrame = [rowkey: string, officeAddress: string ... 3 more fields]
query: org.apache.spark.sql.DataFrame = [personalName: string, officeAddress: string]
scala> query.show()
+------------+-------------+
|personalName|officeAddress|
+------------+-------------+
|John Jackson| 40 Ellis St.|
|John Jackson| 40 Ellis St.|
+------------+-------------+
scala>
Stack Versions :
HBase 2.2.0
Hadoop 3.1.1
Spark 2.4.0
Scala 2.11.12
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.