简体   繁体   English

转换JavaPairRDD <ImmutableBytesWritable, Result> 到JavaRDD <String>

[英]Convert JavaPairRDD<ImmutableBytesWritable, Result> to JavaRDD<String>

I am trying to read the data from HBase using Apache Spark. 我正在尝试使用Apache Spark从HBase读取数据。 I want to only scan one specific column. 我只想扫描一列。 I am creating an RDD of my HBase data like below 我正在创建我的HBase数据的RDD,如下所示

SparkConf sparkConf = new SparkConf().setAppName("HBaseRead").setMaster("local[2]");
JavaSparkContext sc = new JavaSparkContext(sparkConf);
Configuration conf = HBaseConfiguration.create();
conf.set("hbase.zookeeper.quorum", "localhost:2181");

String tableName = "myTable";

conf.set(TableInputFormat.INPUT_TABLE, tableName);
conf.set(TableInputFormat.SCAN_COLUMN_FAMILY, "myCol");


 JavaPairRDD<ImmutableBytesWritable, Result> hBaseRDD = sc.newAPIHadoopRDD(conf, TableInputFormat.class,
        ImmutableBytesWritable.class, Result.class);

Here is where I want to convert the JavaPairRDD to JavaRDD of string. 这是我想将JavaPairRDD转换为字符串的JavaRDD的地方。

JavaRDD<String> rdd = ...

How can I achieve this? 我该如何实现?

You can get JavaRDD<String> using map function like below. 您可以使用如下所示的map函数获取JavaRDD<String>

import org.apache.spark.api.java.function.Function;
import org.apache.hadoop.hbase.util.Bytes;
import scala.Tuple2;

JavaRDD<String> javaRDD = javaPairRdd.map(new Function<Tuple2<ImmutableBytesWritable,Result>, String>() {
    @Override
    public String call(Tuple2<ImmutableBytesWritable, Result> tuple) throws Exception {
        Result result = tuple._2;
        String rowKey = Bytes.toString(result.getRow());//row key
        String fName = Bytes.toString(result.getValue(Bytes.toBytes("myColumnFamily"), Bytes.toBytes("firstName")));//firstName column 
        return fName;
    }       
});

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM