简体   繁体   English

HBase模式设计示例

[英]HBase schema design example

I would like some advice about the HBase schema design. 我想要一些有关HBase模式设计的建议。 For example, there are 2000 patients, 1. Each patient has a name, sex, age, hospital_ID. 例如,有2000名患者,即1。每个患者都有姓名,性别,年龄,医院ID。 2. Each patient will be recorded activity data such as heart bits, location and steps every minute. 2.每位患者每分钟都会记录活动数据,例如心脏位,位置和步数。 3. Each patient will take several questionnaires. 3.每位患者将接受几份问卷。

how to organise the HBase table? 如何组织HBase表?

Thank you very much for your help 非常感谢您的帮助

My current idea is to use the patient_ID as the row key. 我当前的想法是使用Patient_ID作为行键。 each patient will have only one row in the HBase table. 每个病人在HBase表中只有一行。 But, all activity data will be grouped in the nested table. 但是,所有活动数据都将在嵌套表中分组。 The activity data table will have millions of rows. 活动数据表将具有数百万行。 So, the table will have three column families. 因此,该表将具有三个列族。 CF1:info, CF2:activity_data, CF3:questionnaires. CF1:信息,CF2:活动数据,CF3:问卷。

Then, CF1:info includes (name, sex, age, ID) 然后,CF1:info包括(姓名,性别,年龄,ID)

CF2:activity_data (data(a nested table)) CF2:activity_data(数据(嵌套表))

CF3:questionnaires (questionnaired_ID (a nested table)) CF3:questionnaires(questionnaired_ID(嵌套表))

I don't know whether this is a smart way to design the HBase schema. 我不知道这是否是设计HBase模式的明智方法。 Please provide me with some advice. 请给我一些建议。

Thank you very much 非常感谢你

  1. When you design data model it is very important to understand the usage of the data, especially which queries you would like to run efficiently (without full table scan) over data stored in HBase. 在设计数据模型时,了解数据的使用非常重要,尤其是要对存储在HBase中的数据高效(无全表扫描)运行哪些查询。
  2. activity_data seems to be a raw data, but other two parts related to the "Patient profile". activity_data似乎是原始数据,但其他两个部分与“患者资料”有关。 There is a recommendation to keep more or less the same size of column families in the same table. 建议在同一张表中保持或多或少相同的列族大小。 Then probably better to keep activity_data in a different table, then aggregate to let's say daily summary and store the result in the "Patient profile" table. 然后可能最好将activity_data保留在另一个表中,然后进行汇总以得出每日摘要,并将结果存储在“患者资料”表中。

I hope it was helpful. 希望对您有所帮助。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM