简体   繁体   English

Hbase反向扫描

[英]Hbase reverse scan

My data keys are stored in format trade<date><index>我的数据键以trade<date><index>格式存储

trade1907030001
trade1907030002
trade1907040001
trade1907040002
trade1907050001
trade1907050002

What is proper way to implement 'reverse' scan to iterate over all trades for the day or from specific row down to the end of the day or even between two exact trades?什么是实现“反向”扫描以迭代当天或从特定行到一天结束甚至两个确切交易之间的所有交易的正确方法?

Scan scan = new Scan();
scan.setReversed(true);
scan.setStartRow(Bytes.unsignedCopyAndIncrement(Bytes.toBytes(trade + day)));
scan.setStopRow(Bytes.toBytes(trade + day));

Having in mind that according to documentatin start row is inclusive and end row is exclusive, we'll miss oldest trade of the day.请记住,根据文档,开始行是包含性的,而结束行是不包含的,我们将错过当天最旧的交易。 If the row is actually the trade row trade we must not increment the key, otherwise next trade will be picked up.如果该行实际上是交易行交易,我们一定不要增加键,否则将提取下一笔交易。 It started to be conditional.它开始是有条件的。 How could I make it work reliable for different situations?我怎样才能让它在不同的情况下可靠地工作?

You can use:您可以使用:

Scan scan = new Scan();
scan.setReversed(true);
scan.setRowPrefixFilter(Bytes.toBytes(trade + day));

which automatically takes cares of ensuring the first and last trades aren't ignored.它会自动确保第一笔和最后一笔交易不会被忽略。

Source: https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html#setRowPrefixFilter-byte:A-来源: https : //hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html#setRowPrefixFilter-byte : A-

This is how scan actually works (tested in hbase shell v1.2.0-cdh5.13.3):这就是扫描的实际工作方式(在 hbase shell v1.2.0-cdh5.13.3 中测试):

trade171020S00001                                          column=inp:data_as_of_date, timestamp=1511793438335, value=20171020
trade171018B00001                                          column=inp:data_as_of_date, timestamp=1511793438335, value=20171020
trade171020S00001                                          column=inp:data_as_of_date, timestamp=1511793438335, value=20171020
trade171113B00001                                          column=inp:data_as_of_date, timestamp=1511993729979, value=20171114
trade171114S00001                                          column=inp:data_as_of_date, timestamp=1511993729979, value=20171114

scan 'namespace:table', {COLUMNS=>['inp:data_as_of_date'], STARTROW=>'trade171018B00001', ENDROW=>'trade171113B00001'}
ROW                                                                  COLUMN+CELL
trade171018B00001                                          column=inp:data_as_of_date, timestamp=1511793438335, value=20171020
trade171020S00001                                          column=inp:data_as_of_date, timestamp=1511793438335, value=20171020

scan 'namespace:table', {COLUMNS=>['inp:data_as_of_date'], STARTROW=>'trade171113B00001', ENDROW=>'trade171018B00001', REVERSED=>true}
ROW                                                                  COLUMN+CELL
trade171113B00001                                          column=inp:data_as_of_date, timestamp=1511993729979, value=20171114
trade171020S00001                                          column=inp:data_as_of_date, timestamp=1511793438335, value=20171020

scan 'namespace:table', {COLUMNS=>['inp:data_as_of_date'], STARTROW=>'trade171018', ENDROW=>'trade171113'}
ROW                                                                  COLUMN+CELL
trade171018B00001                                          column=inp:data_as_of_date, timestamp=1511793438335, value=20171020
trade171020S00001                                          column=inp:data_as_of_date, timestamp=1511793438335, value=20171020

scan 'namespace:table', {COLUMNS=>['inp:data_as_of_date'], STARTROW=>'trade171113', ENDROW=>'trade171018', REVERSED=>true}
ROW                                                                  COLUMN+CELL
trade171020S00001                                          column=inp:data_as_of_date, timestamp=1511793438335, value=20171020
trade171018B00001                                          column=inp:data_as_of_date, timestamp=1511793438335, value=20171020

scan 'namespace:table', {COLUMNS=>['inp:data_as_of_date'], ROWPREFIXFILTER=>'trade171113'}
ROW                                                                  COLUMN+CELL
trade171113B00001                                          column=inp:data_as_of_date, timestamp=1511993729979, value=20171114

scan 'namespace:table', {COLUMNS=>['inp:data_as_of_date'], ROWPREFIXFILTER=>'trade171113', REVERSED=>true}
ROW                                                                  COLUMN+CELL
0 row(s) in 0.2300 seconds

If start row and end row is shorter then table row keys , following will work as expected如果开始行和结束行比表行键短,以下将按预期工作

Scan scan = new Scan();
scan.setStartRow(Bytes.toBytes(trade + day));
scan.setStopRow(Bytes.unsignedCopyAndIncrement(Bytes.toBytes(trade + day)));

Scan scan = new Scan();
scan.setReversed(true);
scan.setStartRow(Bytes.unsignedCopyAndIncrement(Bytes.toBytes(trade + day)));
scan.setStopRow(Bytes.toBytes(trade + day));

If start row and end row could be same length as table row keys , following will work as expected如果开始行和结束行的长度可能与表行键的长度相同,则以下将按预期工作

Scan scan = new Scan();
scan.setStartRow(createKey("S", productSymbolId, YYMMDD.print(fromDate)));
scan.setStopRow(createNextKey("S", productSymbolId, YYMMDD.print(toDate)));

Scan scan = new Scan();
scan.setReversed(true);
scan.setStartRow(createKeyBeforeNext("A", stripSpaces(accountId), YYMMDD.print(toDate)));
scan.setStopRow(createKeyBefore("A", stripSpaces(accountId), YYMMDD.print(fromDate)));

where在哪里

key === 54686973697361746573746b6579
next === 54686973697361746573746b657a
before === 54686973697361746573746b6578ffffffffffffffffff
beforeNext === 54686973697361746573746b6579ffffffffffffffffff

implementation执行

/**
 * <h4>usage</h4>
 * 
 * <pre>
 * Scan scan = new Scan();
 * scan.setStartRow(createKey("S", productSymbolId, YYMMDD.print(fromDate)));
 * scan.setStopRow(createNextKey("S", productSymbolId, YYMMDD.print(toDate)));
 *
 * Scan scan = new Scan();
 * scan.setReversed(true);
 * scan.setStartRow(createKeyBeforeNext("A", stripSpaces(accountId), YYMMDD.print(toDate)));
 * scan.setStopRow(createKeyBefore("A", stripSpaces(accountId), YYMMDD.print(fromDate)));
 * </pre>
 * 
 * <h4>spec</h4>
 * 
 * <pre>
 * key === 54686973697361746573746b6579
 * next === 54686973697361746573746b657a
 * before === 54686973697361746573746b6578ffffffffffffffffff
 * beforeNext === 54686973697361746573746b6579ffffffffffffffffff
 * </pre>
 * 
 * @see #createKeyBefore(String...)
 * @see #createKeyBeforeNext(String...)
 * @see #createNextKey(String...)
 */
// similar to Bytes.add(final byte [] a, final byte [] b, final byte [] c) {
public static byte[] createKey(String... parts) {
    byte[][] bytes = new byte[parts.length][];
    int size = 0;
    for (int i = 0; i < parts.length; i++) {
        bytes[i] = toBytes(parts[i]);
        size += bytes[i].length;
    }
    byte[] result = new byte[size];
    for (int i = 0, j = 0; i < bytes.length; i++) {
        arraycopy(bytes[i], 0, result, j, bytes[i].length);
        j += bytes[i].length;
    }
    return result;
}

/**
 * Create the next row
 * 
 * <pre>
 * key === 54686973697361746573746b6579
 * next === 54686973697361746573746b657a
 * </pre>
 * 
 * @see #createKey(String...)
 */
public static byte[] createNextKey(String... parts) {
    return unsignedCopyAndIncrement(createKey(parts));
}

/**
 * Create the closest row before
 * 
 * <pre>
 * key === 54686973697361746573746b6579
 * before === 54686973697361746573746b6578ffffffffffffffffff
 * </pre>
 * 
 * @see #createKey(String...)
 */
public static byte[] createKeyBefore(String... parts) {
    return createClosestRowBefore(createKey(parts));
}

/**
 * Create the closest row before the next row
 * 
 * <pre>
 * key === 54686973697361746573746b6579
 * beforeNext === 54686973697361746573746b6579ffffffffffffffffff
 * </pre>
 * 
 * @see #createKey(String...)
 */
public static byte[] createKeyBeforeNext(String... parts) {
    return createClosestRowBefore(createNextKey(parts));
}

// from hbase sources ClientScanner.createClosestRowBefore(byte[] row)
private static byte[] createClosestRowBefore(byte[] row) {
    if (row == null)
        throw new IllegalArgumentException("The passed row is empty");
    if (Bytes.equals(row, HConstants.EMPTY_BYTE_ARRAY))
        return MAX_BYTE_ARRAY;
    if (row[row.length - 1] == 0)
        return Arrays.copyOf(row, row.length - 1);
    byte[] closestFrontRow = Arrays.copyOf(row, row.length);
    closestFrontRow[row.length - 1] = (byte) ((closestFrontRow[row.length - 1] & 0xff) - 1);
    closestFrontRow = Bytes.add(closestFrontRow, MAX_BYTE_ARRAY);
    return closestFrontRow;
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM