简体   繁体   中英

Who executes HBase filters?

Which machine executes org.apache.hadoop.hbase.filter.Filter ?

According to documentation , when reading data from HBase using Get or Scan operations, you can use custom filters to return a subset of results to the client. While this does not reduce server-side IO, it does reduce network bandwidth and reduces the amount of data the client needs to process.

From what I see spark executor machine are doing remote calls in hbase client's background threads to query HBase data. And those calls are rarely to local machine's hbase region server.

So I'm wordering whether my custom filter executes on Spark executor machine, having huge network overhead, contradicting with what documentation assures, or it is somehow transfered over the network and executes on HBase machine?
I doute so as Filter is not Serializable. So next question would be whether it is possible to optimise anything here?

Filter executed in region server process. HBase can load it dynamically if you put jar file with its code into dir specified in HBase config by parameter hbase.dynamic.jars.dir . Filter interface not implement Serializable interface, but it has the method

static Filter    parseFrom(byte[] pbBytes)

that create the filter from a serialised form. In class FilterBase which should be used for custom filter creation exist the method

abstract byte[] toByteArray()

for filter serialization to a byte array.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM