简体   繁体   English

Spark Hbase连接器(SHC)不从HBase表返回任何数据

[英]Spark Hbase connector (SHC) is not returning any data from HBase table

I am following spark hbase connector basic example to read a HBase table in spark2 shell version 2.2.0. 我正在遵循spark hbase连接器基本示例以在spark2 shell版本2.2.0中读取HBase表。 It looks like the code is working, but when I run df.show() command, I do not see any results and it seems to run forever. 看起来代码正在运行,但是当我运行df.show()命令时,我看不到任何结果,并且似乎可以永远运行。

import org.apache.spark.sql.{ DataFrame, Row, SQLContext }
import org.apache.spark.sql.execution.datasources.hbase._

val sqlContext = new org.apache.spark.sql.SQLContext(sc); 

def catalog = s"""{
         |"table":{"namespace":"default", "name":"testmeta"},
         |"rowkey":"vgil",
         |"columns":{
            |"id":{"cf":"rowkey", "col":"vgil", "type":"string"},
           |"col1":{"cf":"pp", "col":"dtyp", "type":"string"}
         |}
       |}""".stripMargin


def withCatalog(cat: String): DataFrame = { sqlContext.read.options(Map(HBaseTableCatalog.tableCatalog->cat)).format("org.apache.spark.sql.execution.datasources.hbase").load()}

val df = withCatalog(catalog)

df.show()

df.show() will neither give any output nor any error. df.show()既不会给出任何输出,也不会提供任何错误。 It will keep on running forever. 它将永远继续运行。

Also, how can I run queryy for range of row keys. 另外,如何查询行键的范围。

Here is the scan of the HBase test table. 这是HBase测试表的扫描。

hbase(main):001:0> scan 'testmeta'
ROW                                 COLUMN+CELL                                                                                            
 fmix                            column=pp:dtyp, timestamp=1541714925380, value=ss1                                                     
 fmix                            column=pp:lati, timestamp=1541714925371, value=41.50                                                   
 fmix                            column=pp:long, timestamp=1541714925374, value=-81.61                                                  
 fmix                            column=pp:modm, timestamp=1541714925377, value=ABC                                                                                                   
 vgil                            column=pp:dtyp, timestamp=1541714925405, value=ss2                                                     
 vgil                            column=pp:lati, timestamp=1541714925397, value=41.50                                                   

I have followed some of solutions on the web, but unfortunately not able to get the data from HBase. 我已经遵循了Web上的一些解决方案,但不幸的是无法从HBase获取数据。

Thanks in advance for help! 在此先感谢您的帮助!

Posting my answer after lots of trial, so I found that adding --conf option to start spark shell helped me connect to HBase. 经过大量试用后发布我的答案,因此我发现添加--conf选项来启动spark shell帮助我连接到HBase。

spark2-shell --master yarn --deploy-mode client --packages com.hortonworks:shc-core:1.1.1-2.1-s_2.11,it.nerdammer.bigdata:spark-hbase-connector_2.10:1.0.3 --repositories http://repo.hortonworks.com/content/groups/public/ --conf spark.hbase.host=192.168.xxx.xxx --files /mnt/fs1/opt/cloudera/parcels/CDH-5.13.0-1.cdh5.13.0.p0.29/share/doc/hbase-solr-doc-1.5+cdh5.13.0+71/demo/hbase-site.xml

Then the following code snippet could fetch a value for one column qualifier. 然后,以下代码片段可以获取一列限定符的值。

val hBaseRDD_iacp = sc.hbaseTable[(String)]("testmeta").select("lati").inColumnFamily("pp").withStartRow("vg").withStopRow("vgz")      
object myschema {
      val column1 = StructField("column1",  StringType)
      val struct = StructType(Array(column1))
    }


val rowRDD = hBaseRDD.map(x => Row(x))
val myDf = sqlContext.createDataFrame(rowRDD,myschema.struct)  
myDf.show()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM