如何设计Hbase架构？

Question

suppose that I have this RDBM table ( Entity-attribute-value_model ): 假设我有这个RDBM表（ Entity-attribute-value_model ）：

col1: entityID
col2: attributeName
col3: value

and I want to use HBase due to scaling issues. 我想使用HBase由于扩展问题。

I know that the only way to access Hbase table is using a primary key (cursor). 我知道访问Hbase表的唯一方法是使用主键（游标）。 you can get a cursor for a specific key, and iterate the rows one-by-one . 您可以获取特定键的游标，并逐个迭代行。

The issue is, that in my case, I want to be able to iterate on all 3 columns. 问题是，在我的情况下，我希望能够迭代所有3列。 for example : 例如：

for a given an entityID I want to get all its attriutes and values 对于给定的entityID，我想获得它的所有属性和值
for a give attributeName and value I want to all the entitiIDS ... for give attributeName和value我想要所有的entitiIDS ...

so one idea I had is to build one Hbase table that will hold the data (table DATA, with entityID as primary index), and 2 "index" tables one with attributeName as a primary key, and the other one with value 所以我有一个想法是构建一个Hbase表来保存数据（表DATA，其中entityID作为主索引），以及2个“索引”表，其中一个以attributeName作为主键，另一个表示值

each index table will hold a list of pointers (entityIDs) for the DATA table. 每个索引表将保存DATA表的指针列表（entityID）。

Is it a reasonable approach ? 这是一种合理的方法吗？ or is is an 'abuse' of Hbase concepts ? 或者是对Hbase概念的“滥用”？

In this blog the author say: 在这篇博客中，作者说：

HBase allows get operations by primary key and scans (think: cursor) over row ranges. HBase允许通过主键获取操作并在行范围内扫描（思考：游标）。 (If you have both scale and need of secondary indexes, don't worry - Lucene to the rescue! But that's another post.) （如果你有二级指标的规模和需求，不要担心 - Lucene拯救！但这是另一篇文章。）

Do you know how Lucene can help ? 你知道Lucene怎么帮忙吗？

-- Yonatan - Yonatan

Answer 1

Secondary indexes would indeed be useful for many potential applications of HBase, and I believe the developers are in fact looking at it. 二级索引确实对HBase的许多潜在应用程序有用，我相信开发人员实际上正在研究它。 Checkout http://www.mail-archive.com/hbase-dev@hadoop.apache.org/msg04801.html . 查看http://www.mail-archive.com/hbase-dev@hadoop.apache.org/msg04801.html 。

In the mean time though, if your application data storage can be modelled as a star schema (see http://en.wikipedia.org/wiki/Star_schema ) you might like to checkout the solution that Hypertable proposes for secondary index-type needs http://markmail.org/message/rphm4q6cbar2ycgp 同时，如果您的应用程序数据存储可以建模为星型模式（请参阅http://en.wikipedia.org/wiki/Star_schema ），您可能希望查看Hypertable针对二级索引类型需求提出的解决方案http://markmail.org/message/rphm4q6cbar2ycgp

Answer 2

I recommend having two different flat tables: one for looking up attributes+values given entityID, and one for looking up the entityID given attributes+values. 我建议使用两个不同的平面表：一个用于查找属性+给定entityID的值，另一个用于查找给定属性+值的entityID。

Table 1 would look like this: 表1看起来像这样：

entityID1 {
  attribute1: value1;
  attribute2: value2;
  ...
}

and Table 2: 和表2：

attribute1_value1 {
  entityID1;
}
attribute2_value2 {
  entityID1;
}

如何设计Hbase架构？

问题描述

2 个解决方案

解决方案1
5 已采纳 2009-02-13 15:54:39

解决方案2
0 2013-06-04 15:00:40

如何设计Hbase架构？

问题描述

2 个解决方案

解决方案1 5 已采纳 2009-02-13 15:54:39

解决方案2 0 2013-06-04 15:00:40

解决方案1
5 已采纳 2009-02-13 15:54:39

解决方案2
0 2013-06-04 15:00:40