简体   繁体   中英

Delete hbase cell using spark

Is there any api available to delete a specific HBase cell using Spark Scala. We are able to read and write using Spark-HBase Connector. Any suggestion for cell deletion is highly appreciable.

Here is an implementation for deletion of HBase Cell objects, using Spark (I demonstrated it using parallelize , you can adjust it to your Cells RDD).

General idea: removal in chunks - iterates through each RDD partition, splitting the partition to chunks of 10,000 Cells, converting each Cell to HBase Delete object, then calling table.delete() to perform the deletion from HBase.

public void deleteCells(List<Cell> cellsToDelete) {

    JavaSparkContext sc = new JavaSparkContext();

    sc.parallelize(cellsToDelete)
        .foreachPartition(cellsIterator -> {
            int chunkSize = 100000; // Will contact HBase only once per 100,000 records

            org.apache.hadoop.conf.Configuration config = new org.apache.hadoop.conf.Configuration();
            config.set("hbase.zookeeper.quorum", "YOUR-ZOOKEEPER-HOSTNAME");

            Table table;

            try {
                Connection connection = ConnectionFactory.createConnection(config);
                table = connection.getTable(TableName.valueOf(config.get("YOUR-HBASE-TABLE")));
            }
            catch (IOException e)
            {
                logger.error("Failed to connect to HBase due to inner exception: " + e);

                return;
            }

            // Split the given cells iterator to chunks
            Iterators.partition(cellsIterator, chunkSize)
                .forEachRemaining(cellsChunk -> {
                    List<Delete> deletions = Lists.newArrayList(cellsChunk
                            .stream()
                            .map(cell -> new Delete(cell.getRowArray(), cell.getRowOffset(), cell.getRowLength())
                                    .addColumn(cell.getFamily(), cell.getQualifier(), System.currentTimeMillis()))
                            .iterator());

                    try {
                        table.delete(deletions);
                    } catch (IOException e) {
                        logger.error("Failed to delete a chunk due to inner exception: " + e);
                    }
                });

        });
}

Disclaimer: this exact snippet was not tested, but I have used the same method for removal of billions of HBase Cells using Spark.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM