I have an Apache Spark application written in Scala that tries to read data from HBase and do something with it.
I've encountered ways to do just that like this and also how to do so using Spark Streaming
So I wrote the following code:
def main(args: Array[String]): Unit = {
val configuration = HBaseConfiguration.create()
configuration.set(TableInputFormat.INPUT_TABLE, "urls")
configuration.set(TableInputFormat.SCAN_COLUMNS, "values:words")
val hbaseRdd = sc.newAPIHadoopRDD(configuration,
classOf[TableInputFormat],
classOf[ImmutableBytesWritable],
classOf[Result]
)
val data = hbaseRdd.map(entry => {
val result = entry._2
Bytes.toString(result.getRow)
})
data.foreach(println)
}
My HBase table is created like this: create 'urls', {NAME => 'values', VERSIONS => 5}
What I'm getting is:
16/03/10 17:10:17 ERROR TableInputFormat: java.io.IOException: java.lang.reflect.InvocationTargetException
at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:240)
at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:218)
at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:119)
at org.apache.hadoop.hbase.mapreduce.TableInputFormat.initialize(TableInputFormat.java:183)
at org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.getSplits(TableInputFormatBase.java:241)
at org.apache.hadoop.hbase.mapreduce.TableInputFormat.getSplits(TableInputFormat.java:237)
After reading about this exception here I should probably add this as part of the stack trace:
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:238)
... 34 more
Caused by: java.lang.ClassCastException: org.apache.hadoop.hbase.ipc.RpcClientImpl cannot be cast to org.apache.hadoop.hbase.ipc.RpcClient
at org.apache.hadoop.hbase.ipc.RpcClientFactory.createClient(RpcClientFactory.java:64)
at org.apache.hadoop.hbase.ipc.RpcClientFactory.createClient(RpcClientFactory.java:48)
at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.<init>(ConnectionManager.java:637)
... 39 more
My questions are:
It could be even better if I could somehow read the data as a dataframe
I'm using Spark 1.6.0 and HBase 1.2.0
Thanks in advance
Ok so apparently it was an unexpected dependencies issue (as it always when it doesn't make any sense).
These are the steps I took in order to solve this issue (hopefully they will help future developers):
Thats it :)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.