简体   繁体   English

HBase中的部分行键扫描

[英]Partial Row key scan in HBase

I have to following row key in my hbase deployment: 我必须在hbase部署中遵循以下行键:

EquipmentNumber|LogTime EquipmentNumber | LOGTIME

for example: 454312|20180304124511 例如:454312 | 20180304124511

Now I want to do a partial row key scan ie I want scan only on LogTime range 现在我要进行部分行键扫描,即我只想在LogTime范围内扫描

for example I want to get all the equipment numbers between logtime t1 and t2? 例如,我想获取日志时间t1和t2之间的所有设备号?

Can any body please help. 任何机构都可以帮忙。

HBase rows are sorted in alphabetical order, so you can scan by prefix, but you can't scan by the suffix. HBase行按字母顺序排序,因此您可以按前缀进行扫描,但是不能按后缀进行扫描。

One thing you can do is to scan the whole table using RowFilter - write filter logic based on your field LogTime. 您可以做的一件事是使用RowFilter扫描整个表-根据字段LogTime编写过滤器逻辑。 It will filter rows on server, so your client code will get only correct rows. 它将过滤服务器上的行,因此您的客户端代码将仅获取正确的行。

With filters full scan seems unavoidable unless you know approximate range of equipment ids that can fall within given duration which may not be always possible. 对于过滤器,除非您知道设备ID的大概范围可能落在给定的持续时间内(这并非总是可能的),否则全扫描似乎是不可避免的。

An alternate way of achieving this can be to use an intermediate lookup/index table which maps your second part of row key to the first part or to the composite row key in the actual data table. 实现此目的的另一种方法是使用中间查找/索引表,该表将行键的第二部分映射到实际数据表中的第一部分或复合行键。 This will keeps your primary access pattern to find records by equipment id as is and when you want to look up by second key use the lookup table to find row keys in your data table and use those keys to fetch the required data. 这将保持您的主要访问模式按设备ID原样查找记录,当您想通过第二个键查找时,请使用查找表在数据表中查找行键,然后使用这些键来获取所需的数据。 This approach however would put responsibility on your application to keep the lookup table in sync to update/delete in your data table. 但是,这种方法将使您的应用程序有责任保持查询表同步以更新/删除数据表。

For auto-management of indexes you can try Phoenix, you can create a Phoenix table with global index on logTime, here is some quick sample - 对于索引的自动管理,您可以尝试使用Phoenix,您可以在logTime上创建具有全局索引的Phoenix表,以下是一些快速示例-

CREATE TABLE "SO50228751"(
"equipNum" integer not null,
"logtime" bigint not null,
"f"."data" varchar
CONSTRAINT pk PRIMARY KEY ("equipNum", "logtime")); 

Add data 新增资料

upsert into "SO50228751"("equipNum", "logtime","f"."data")  values(454312,20180304124511,'a');
upsert into "SO50228751"("equipNum", "logtime","f"."data")  values(454312,20180304124512,'b');
upsert into "SO50228751"("equipNum", "logtime","f"."data")  values(454312,20180304124513,'c');
upsert into "SO50228751"("equipNum", "logtime","f"."data")  values(454312,20180304124514,'d');
upsert into "SO50228751"("equipNum", "logtime","f"."data")  values(454312,20180304124515,'e');
upsert into "SO50228751"("equipNum", "logtime","f"."data")  values(454313,20180304124521,'f');
upsert into "SO50228751"("equipNum", "logtime","f"."data")  values(454313,20180304124522,'g');
upsert into "SO50228751"("equipNum", "logtime","f"."data")  values(454313,20180304124523,'h');
upsert into "SO50228751"("equipNum", "logtime","f"."data")  values(454313,20180304124524,'i');
upsert into "SO50228751"("equipNum", "logtime","f"."data")  values(454312,20180304124524,'ii');

Create Index 创建索引

CREATE INDEX so_idx ON "SO50228751"(logtime);

Query by logTime using index 使用索引按logTime查询

select /*+ /*+ INDEX("SO50228751" so_idx) */ * from "SO50228751" where "logtime" between 20180304124511 and 20180304124516;

Before you decide on choosing Phoenix indexes please do check the documentation and this link -> https://community.hortonworks.com/articles/61705/art-of-phoenix-secondary-indexes.html to understand how well they fit in your use case. 在决定选择Phoenix索引之前,请检查文档和此链接-> https://community.hortonworks.com/articles/61705/art-of-phoenix-secondary-indexes.html,以了解它们在您中的适应程度如何用例。

Hope this helps. 希望这可以帮助。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM