简体   繁体   English

PySpark:可以将saveAsNewAPIHadoopDataset()用作HBase的批量加载吗?

[英]PySpark: Can saveAsNewAPIHadoopDataset() be used as bulk loading to HBase?

We currently import data to HBase tables via Spark RDDs (pyspark) by using saveAsNewAPIHadoopDataset(). 我们目前通过使用saveAsNewAPIHadoopDataset()通过Spark RDDs(pyspark)将数据导入HBase表。

Is this function using the HBase bulk loading feature via mapreduce? 此功能是否通过mapreduce使用HBase批量加载功能? In other words, would saveAsNewAPIHadoopDataset(), which imports directly to HBase, be equivalent to using saveAsNewAPIHadoopFile() to write Hfiles to HDFS, and then invoke org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles to load to HBase? 换句话说,saveAsNewAPIHadoopDataset(),直接导入HBase,相当于使用saveAsNewAPIHadoopFile()将Hfiles写入HDFS,然后调用org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles加载到HBase?

Here is an example snippet of our HBase loading routine: 以下是我们的HBase加载例程的示例片段:

conf = {"hbase.zookeeper.quorum": config.get(gethostname(),'HBaseQuorum'),
        "zookeeper.znode.parent":config.get(gethostname(),'ZKznode'),
        "hbase.mapred.outputtable": table_name,
        "mapreduce.outputformat.class": "org.apache.hadoop.hbase.mapreduce.TableOutputFormat",
        "mapreduce.job.output.key.class": "org.apache.hadoop.hbase.io.ImmutableBytesWritable",
        "mapreduce.job.output.value.class": "org.apache.hadoop.io.Writable"}

keyConv = "org.apache.spark.examples.pythonconverters.StringToImmutableBytesWritableConverter"
valueConv = "org.apache.spark.examples.pythonconverters.StringListToPutConverter"

spark_rdd.saveAsNewAPIHadoopDataset(conf=conf,keyConverter=keyConv,valueConverter=valueConv)

Not exactly. 不完全是。 RDD.saveAsNewAPIHadoopDataset and RDD.saveAsNewAPIHadoopFile do almost the same thing. RDD.saveAsNewAPIHadoopDatasetRDD.saveAsNewAPIHadoopFile几乎完全相同。 Their API is just a little different. 他们的API有点不同。 Each provides a different 'mechanism vs policy' choice. 每个都提供了不同的“机制与政策”选择。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM