简体   繁体   English

如何设计Hbase架构?

[英]how to design Hbase schema?

suppose that I have this RDBM table ( Entity-attribute-value_model ): 假设我有这个RDBM表( Entity-attribute-value_model ):

col1: entityID
col2: attributeName
col3: value

and I want to use HBase due to scaling issues. 我想使用HBase由于扩展问题。

I know that the only way to access Hbase table is using a primary key (cursor). 我知道访问Hbase表的唯一方法是使用主键(游标)。 you can get a cursor for a specific key, and iterate the rows one-by-one . 您可以获取特定键的游标,并逐个迭代行。

The issue is, that in my case, I want to be able to iterate on all 3 columns. 问题是,在我的情况下,我希望能够迭代所有3列。 for example : 例如 :

  • for a given an entityID I want to get all its attriutes and values 对于给定的entityID,我想获得它的所有属性和值
  • for a give attributeName and value I want to all the entitiIDS ... for give attributeName和value我想要所有的entitiIDS ...

so one idea I had is to build one Hbase table that will hold the data (table DATA, with entityID as primary index), and 2 "index" tables one with attributeName as a primary key, and the other one with value 所以我有一个想法是构建一个Hbase表来保存数据(表DATA,其中entityID作为主索引),以及2个“索引”表,其中一个以attributeName作为主键,另一个表示值

each index table will hold a list of pointers (entityIDs) for the DATA table. 每个索引表将保存DATA表的指针列表(entityID)。

Is it a reasonable approach ? 这是一种合理的方法吗? or is is an 'abuse' of Hbase concepts ? 或者是对Hbase概念的“滥用”?

In this blog the author say: 在这篇博客中作者说:

HBase allows get operations by primary key and scans (think: cursor) over row ranges. HBase允许通过主键获取操作并在行范围内扫描(思考:游标)。 (If you have both scale and need of secondary indexes, don't worry - Lucene to the rescue! But that's another post.) (如果你有二级指标的规模和需求,不要担心 - Lucene拯救!但这是另一篇文章。)

Do you know how Lucene can help ? 你知道Lucene怎么帮忙吗?

-- Yonatan - Yonatan

Secondary indexes would indeed be useful for many potential applications of HBase, and I believe the developers are in fact looking at it. 二级索引确实对HBase的许多潜在应用程序有用,我相信开发人员实际上正在研究它。 Checkout http://www.mail-archive.com/hbase-dev@hadoop.apache.org/msg04801.html . 查看http://www.mail-archive.com/hbase-dev@hadoop.apache.org/msg04801.html

In the mean time though, if your application data storage can be modelled as a star schema (see http://en.wikipedia.org/wiki/Star_schema ) you might like to checkout the solution that Hypertable proposes for secondary index-type needs http://markmail.org/message/rphm4q6cbar2ycgp 同时,如果您的应用程序数据存储可以建模为星型模式(请参阅http://en.wikipedia.org/wiki/Star_schema ),您可能希望查看Hypertable针对二级索引类型需求提出的解决方案http://markmail.org/message/rphm4q6cbar2ycgp

I recommend having two different flat tables: one for looking up attributes+values given entityID, and one for looking up the entityID given attributes+values. 我建议使用两个不同的平面表:一个用于查找属性+给定entityID的值,另一个用于查找给定属性+值的entityID。

Table 1 would look like this: 表1看起来像这样:

entityID1 {
  attribute1: value1;
  attribute2: value2;
  ...
}

and Table 2: 和表2:

attribute1_value1 {
  entityID1;
}
attribute2_value2 {
  entityID1;
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM