简体   繁体   English

Hbase列系列设计的重要性

[英]Hbase column family design importance

I am studying HBase but can't find for myself answer for one question. 我正在学习HBase,但无法为自己找到一个问题的答案。

Let's consider the following situation. 让我们考虑以下情况。 We have five physical (hardware) servers (0-4). 我们有五个物理(硬件)服务器(0-4)。 Hmaster is installed on server 0 and four hregion servers are installed on server 1-4. Hmaster安装在服务器0上,四个hregion服务器安装在服务器1-4上。 And we have one very big table which we need to work with these five servers. 我们有一张很大的桌子,我们需要使用这五台服务器。

As I understand every region server is responsible for certain region (some set of rows(!)). 据我了解,每个区域服务器负责某个区域(某些行(!)集)。 It means that always one row (including ALL its column family,columns and cells) is located only in ONE region server (in our example in ONE physical server). 这意味着始终只有一行(包括其所有列族,列和单元格)仅位于一个区域服务器中(在我们的示例中为一个物理服务器中)。

If what I wrote is right I can't understand what is the use and importance of column family. 如果我写的没错,我将无法理解列族的用途和重要性。 Please correct me if I am wrong and/or exmplain what column families are used for. 如果我写错了和/或举例说明了列族的用途,请纠正我。

It's more for IO performance when you scan/fetch. 扫描/获取时,更多的是为了提高IO性能。 If you find yourself only using columns X, Y, and Z but not A, B, and C during a scan/fetch, you can partition your data into two column families to improve IO performance. 如果在扫描/读取期间发现自己仅使用X,Y和Z列,而不使用A,B和C列,则可以将数据分为两个列族,以提高IO性能。

There is probably also a small benefit for compressed tables too since like data could be physically grouped together and thus more easily compressed. 压缩表也可能会有一点好处,因为类似的数据可以物理分组在一起,因此更容易压缩。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM