简体   繁体   中英

Sort order on secondary floating point index in HBase

I'm trying to implement something like a search engine in HBase. Aside from how good an idea this really is (finding out is the reason to do this), I need to support a 'range query' on floating point values. Creating an inverted index would be the default way to do this, mapping floating point value to row key in a separate data structure. For this to work as index, however, I need to be able to issue a Scan from the low point of the range to the high point (at least, that's my current theory).

As HBase orders by byte array, starting a row key with a floating point won't get me a usable index, if only as the very first bit in the byte-representation of a floating point number is 1 for negative values and 0 for positive values (which is out of float value order). As such, I'm at a loss on how to create this index.

Am I taking an idiotic approach to this, or will one of the following work better?

Convert the floating points to a duo of integer values, one before and one after the decimal point:

BigDecimal[] doubleValue = 
    new BigDecimal((Double) value).divideAndRemainder(BigDecimal.ONE);
byte[] valueBytes = new byte[16];
System.arraycopy(Bytes.toBytes(doubleValue[0].longValue()), 0, valueBytes, 0, 8);
System.arraycopy(Bytes.toBytes(doubleValue[1].longValue()), 0, valueBytes, 8, 8);

Somehow convince HBase to use a custom comparator for the row keys (no idea how to do this).

You need to use a different approach to serialize your values into byte[] if you want hbase to sort them properly. Check out https://github.com/ndimiduk/orderly . Alternately, I believe the Lily library can also do this.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM