简体繁体 English

HBase 中的数据读取是如何发生的？

[英]How data read happens in HBase?

原文 2019-06-21 05:46:11 7 1 hadoop/ hbase

We know HBase is deployed on top of Hadoop and HDFS.我们知道 HBase 部署在 Hadoop 和 HDFS 之上。 Also, we know when we want to read a file(or record) from HDFS, it takes a considerable amount of time using HDFS CLI.此外，我们知道当我们想要从 HDFS 读取文件（或记录）时，使用 HDFS CLI 需要花费大量时间。

But even HBase uses HDFS, it's capable to read a key within a couple of milliseconds.但即使 HBase 使用 HDFS，它也能够在几毫秒内读取密钥。 How does this happen?这是怎么发生的？

1 个解决方案

I think the reason includes:我认为原因包括：

Data is split to different Region Servers.数据被拆分到不同的区域服务器。 Client side can get the Region Server from META table, and communicate with HBase Region Servers directly.客户端可以从 META 表中获取 Region Server，并直接与 HBase Region Servers 通信。
Region Servers are collocated with the HDFS DataNodes, which enable data locality (putting the data close to where it is needed) for the data served by the Region Servers. Region Servers 与 HDFS DataNodes 并置，这为 Region Servers 提供的数据启用数据本地化（将数据放在需要的地方）。
An HFile contains a multi-layered index which allows HBase to seek to the data without having to read the whole file. HFile 包含一个多层索引，允许 HBase 查找数据而无需读取整个文件。
HBase read from BlockCache and MemStore first, if the data can be found in BlockCache or MemStore, HBase don't need to read HFiles from HDFS. HBase首先从BlockCache和MemStore中读取，如果数据可以在BlockCache或MemStore中找到，HBase就不需要从HDFS中读取HFile。