简体   繁体   English

Spark Hbase:如何将数据帧转换为 Hbase org.apache.hadoop.hbase.client.Result

[英]Spark Hbase : How to convert a dataframe to Hbase org.apache.hadoop.hbase.client.Result

I have a method Test which takes one argument which is an HBase Result org.apache.hadoop.hbase.client.Result Now I have some Hbase Result data which I have saved in a file and created a dataframe for it and loaded it.我有一个方法 Test ,它接受一个参数,它是一个 HBase 结果org.apache.hadoop.hbase.client.Result现在我有一些 Hbase 结果数据,我已经保存在一个文件中,并为它创建了一个数据框并加载它。

I want to pass this dataframe data to my method to test some functinality.我想将此数据帧数据传递给我的方法来测试一些功能。 But, the problem is, I need to pass it as a result.但是,问题是,我需要通过它作为结果。

I need help in converting Spark Dataframe to Hbase org.apache.hadoop.hbase.client.Result .我需要帮助将 Spark Dataframe转换为 Hbase org.apache.hadoop.hbase.client.Result

I have taken a dataframe and tried to extract org.apache.hadoop.hbase.client.Result from that.我已经获取了一个数据框并试图从中提取org.apache.hadoop.hbase.client.Result this can be done with RDDs.这可以通过 RDD 来完成。

import org.apache.hadoop.hbase.{Cell, CellUtil}

import scala.collection.JavaConversions._
import scala.collection.mutable.ListBuffer

import scala.math.BigInt

import org.apache.spark._
import org.apache.spark.rdd._
import org.apache.spark.sql._

import org.apache.hadoop.hbase.client.Result
import org.apache.hadoop.hbase.io.ImmutableBytesWritable

object HbaseDFToResult extends App {

  val config = new SparkConf().setAppName("test").setMaster("local[*]")

  // below configuration set since org.apache.hadoop.hbase.client.Result is not serializable kryo can serialize this
  config.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
  config.registerKryoClasses(Array(classOf[org.apache.hadoop.hbase.client.Result]))
  val spark = SparkSession.builder().config(config).getOrCreate()
  val mytests = Seq((1, "test1"), (2, "test2"), (3, "test3"), (4, "test4"))

  import spark.implicits._

  val df = mytests.toDF("col1", "col2")
  val counts: RDD[(ImmutableBytesWritable, Result)] = df.rdd.map{ row =>
    val key = row.getAs[Int]("col1")
    val keyByteArray = BigInt(key).toByteArray
    val ibw = new ImmutableBytesWritable()
    ibw.set(keyByteArray)

    val value = row.getAs[String]("col2")
    val valueByteArray = value.getBytes()
    val cellList = List(CellUtil.createCell(valueByteArray))
    val cell: java.util.List[Cell] = ListBuffer(cellList: _*)
    val result = Result.create(cell)

    (ibw, result)

  }
  val results: Array[Result] = counts.map(x => x._2).collect()
  results.foreach(println)
}

log :日志 :

/Library/Java/JavaVirtualMachines/jdk1.8.0_191.jdk/Contents/Home/bin/java "-javaagent:/Applications/IntelliJ IDEA CE.app/Contents/lib/idea_rt.jar=60498:/Applications/IntelliJ IDEA 
....

2019-05-01 15:41:21 INFO  DAGScheduler:54 - Job 0 finished: collect at HbaseDFToResult.scala:41, took 0.568670 s
keyvalues={test1//LATEST_TIMESTAMP/Maximum/vlen=0/seqid=0}
keyvalues={test2//LATEST_TIMESTAMP/Maximum/vlen=0/seqid=0}
keyvalues={test3//LATEST_TIMESTAMP/Maximum/vlen=0/seqid=0}
keyvalues={test4//LATEST_TIMESTAMP/Maximum/vlen=0/seqid=0}
2019-05-01 15:41:21 INFO  SparkContext:54 - Invoking stop() from shutdown hook
2019-05-01 15:41:21 INFO  AbstractConnector:310 - Stopped Spark@4215838f{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
2019-05-01 15:41:21 INFO  SparkUI:54 - Stopped Spark web UI at http://10.219.20.238:4040
2019-05-01 15:41:21 INFO  MapOutputTrackerMasterEndpoint:54 - MapOutputTrackerMasterEndpoint stopped!
2019-05-01 15:41:21 INFO  MemoryStore:54 - MemoryStore cleared
2019-05-01 15:41:21 INFO  BlockManager:54 - BlockManager stopped
2019-05-01 15:41:21 INFO  BlockManagerMaster:54 - BlockManagerMaster stopped
2019-05-01 15:41:21 INFO  OutputCommitCoordinator$OutputCommitCoordinatorEndpoint:54 - OutputCommitCoordinator stopped!
2019-05-01 15:41:21 INFO  SparkContext:54 - Successfully stopped SparkContext
2019-05-01 15:41:21 INFO  ShutdownHookManager:54 - Shutdown hook called
2019-05-01 15:41:21 INFO  ShutdownHookManager:54 - Deleting directory /private/var/folders/mp/xydn5gdj4b51qgc7lsqzrft40000gp/T/spark-a9d46422-f21a-4f2b-98b0-a73238d20dee
Process finished with exit code 0

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用来自不同网络的Java库(org.apache.hadoop.hbase.client)连接Hbase? - How can I connect Hbase using Java Library(org.apache.hadoop.hbase.client) from different network? Hbase作为Mapreduce的接收器:线程“ main”中的异常org.apache.hadoop.hbase.client.RetriesExhaustedException - Hbase as sink for Mapreduce: Exception in thread “main” org.apache.hadoop.hbase.client.RetriesExhaustedException Hbase java.lang.NoClassDefFoundError:org / apache / hadoop / hbase / MasterNotRunningException - Hbase java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/MasterNotRunningException HBase [错误]:org.apache.hadoop.hbase.client.AsyncProcess - 无法获取副本0位置 - HBase [ERROR]: org.apache.hadoop.hbase.client.AsyncProcess - Cannot get replica 0 location for HBase:无法存储数据(org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException) - HBase : Failed to store data (org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException) 异常:org.apache.hadoop.hbase.masternotrunningexception - Exception :org.apache.hadoop.hbase.masternotrunningexception org.apache.hadoop.hbase.MasterNotRunningException - org.apache.hadoop.hbase.MasterNotRunningException 如何通过修改org.apache.hadoop.hbase.mapreduce.RowCounter处理hbase中的大数据? - How to process huge data in hbase by modifying org.apache.hadoop.hbase.mapreduce.RowCounter? 以下工件无法解析:org.apache.hbase:hbase:jar:0.96.1.1-hadoop2 - The following artifacts could not be resolved: org.apache.hbase:hbase:jar:0.96.1.1-hadoop2 使用Cygwin在Windows上启动HBase时出错:找不到主类:org.apache.hadoop.hbase.master.HMaster - Error on start HBase on Windows with Cygwin : Could not find the main class : org.apache.hadoop.hbase.master.HMaster
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM