简体   繁体   中英

Hbase using spark-sql

I have a table named "sample" in hbase. I need to query the table using Apache spark-sql query. Is there any way to read hbase data using Apache spark-sql query?

Spark SQL is an in-memory query engine, to perform some query operation using Spark SQL on top of HBase table you need to

  1. Fetch the data from HBase using Spark and create Spark RDD

     SparkConf sparkConf = new SparkConf(); sparkConf.setAppName("SparkApp"); sparkConf.setMaster("local[*]"); JavaSparkContext javaSparkContext = new JavaSparkContext(sparkConf); Configuration config = HBaseConfiguration.create(); config.addResource(new Path("/etc/hbase/hbase-site.xml")); config.addResource(new Path("/etc/hadoop/core-site.xml")); config.set(TableInputFormat.INPUT_TABLE, "sample"); JavaPairRDD<ImmutableBytesWritable, Result> hbaseRDD = javaSparkContext.newAPIHadoopRDD(hbaseConfig, TableInputFormat.class, ImmutableBytesWritable.class, Result.class); JavaRDD<StudentBean> sampleRDD = hbaseRDD.map(new Function<Tuple2<ImmutableBytesWritable,Result>, StudentBean >() { private static final long serialVersionUID = -2021713021648730786L; public StudentBean call(Tuple2<ImmutableBytesWritable, Result> tuple) { StudentBean bean = new StudentBean (); Result result = tuple._2; bean.setRowKey(rowKey); bean.setFirstName(Bytes.toString(result.getValue(Bytes.toBytes("details"), Bytes.toBytes("firstName")))); bean.setLastName(Bytes.toString(result.getValue(Bytes.toBytes("details"), Bytes.toBytes("lastName")))); bean.setBranch(Bytes.toString(result.getValue(Bytes.toBytes("details"), Bytes.toBytes("branch")))); bean.setEmailId(Bytes.toString(result.getValue(Bytes.toBytes("details"), Bytes.toBytes("emailId")))); return bean; } }); 
  2. Create DataFrame object by using this RDD and register this with some temporary table name and then you can execute your query

     DataFrame schema = sqlContext.createDataFrame(sampleRDD, StudentBean.class); schema.registerTempTable("spark_sql_temp_table"); DataFrame schemaRDD = sqlContext.sql("YOUR_QUERY_GOES_HERE"); JavaRDD<StudentBean> result = schemaRDD.toJavaRDD().map(new Function<Row, StudentBean>() { private static final long serialVersionUID = -2558736294883522519L; public StudentBean call(Row row) throws Exception { StudentBean bean = new StudentBean(); // Do the mapping stuff here return bean; } }); 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM