简体   繁体   English

Hbase中的超前和滞后

[英]Lead and lag in Hbase

I'm trying to figure out how to do the equivalent of Oracle's LEAD and LAG in Hbase or some other sort of pattern that will solve my problem. 我正在尝试找出如何在Hbase中执行与Oracle的LEAD和LAG等效的方法或其他可以解决我的问题的模式。 I could write a MapReduce program that does this quite easily, but I'd love to be able to exploit the fact that the data is already sorted in the way I need it to be. 我可以编写一个很容易做到这一点的MapReduce程序,但是我很想能够利用这样的事实,即数据已经按照我需要的方式进行了排序。

My problem is as follows: I have a rowkey and a value that looks like: 我的问题如下:我有一个行键和一个值,看起来像:

(employee name + timestamp) => data:salary

So, some example data might be: 因此,一些示例数据可能是:

miller, bob;2010-01-14 => data:salary=90000
miller, bob;2010-11-04 => data:salary=102000
miller, bob;2011-12-03 => data:salary=107000
monty, fred;2010-04-10 => data:salary=19000
monty, fred;2011-09-09 => data:salary=24000

What I want to do is calculate the changes of salary, record by record. 我要做的是逐条记录计算工资的变化。 I want to transform the above data into differences between records: 我想将上述数据转换为记录之间的差异:

miller, bob;2010-01-14 => data:salarydiff=90000
miller, bob;2010-11-04 => data:salarydiff=12000
miller, bob;2011-12-03 => data:salarydiff=5000
monty, fred;2010-04-10 => data:salarydiff=19000
monty, fred;2011-09-09 => data:salarydiff=5000

I'm up for changing the rowkey strategy if necessary. 如果需要,我打算更改行键策略。

What I'd do is change the key so that the timestamp will be descending (newer salary first) 我要做的是更改密钥,以使时间戳记递减(新工资先发)

miller, bob;2011-12-03 => data:salary=107000
miller, bob;2010-11-04 => data:salary=102000
miller, bob;2010-01-14 => data:salary=90000

Now you can do a simple map job that will scan the table. 现在,您可以做一个简单的地图工作,它将扫描表格。 Then in the map you create a new Scan to the current key. 然后在地图中创建一个新的“扫描到当前键”。 Scan.next to get the previous salary, calculate the diff and store it in a new column on the current row key Scan.next获取以前的薪水,计算差异并将其存储在当前行键的新列中
Basically in your mapper class (the one that inherits TableMapper) you override the setup method and get the configuration 基本上在您的mapper类(继承TableMapper的类)中,您将覆盖setup方法并获取配置

@Override
protected void setup(Mapper.Context context) throws IOException,InterruptedException {
    Configuration config = context.getConfiguration();
    table = new HTable(config,<Table Name>);
}

Then inside the map you extract the row key from the row parmeter, create the new Scan and continue as explained above 然后在地图内部,从行参数中提取行键,创建新的“扫描”并按照上述说明继续

In most cases the next record would be in the same region - occasionally it might go to another regionserver 在大多数情况下,下一条记录将在同一地区-有时可能会转到其他地区服务器

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM