简体   繁体   中英

OpenTSDB Hbase RowKey design

The OpenTSDB row-key is designed as . I could understand that the design of the key leads to avoiding hot spots while writing data. But while reading if I am seeing all the metrics corresponding to a particular host, it leads to reading data from random region server. So how does it leads to optimize read performance. Was there any assumption made regarding the read pattern for key design.

According to Chris Larsen

The assumption for OpenTSDB was that most dashboards and users will focus on a specific metric or small set of metrics at a given time where the metrics are aggregates across hosts or various tags. Eg, whats my average or max CPU usage?

Querying across multiple region servers is actually a huge benefit in that you can fire off queries in parallel, eg if you ask for "sys.cpu.busy host=web01" and "sys.if.bytes_out host=web01"... and you have multiple region servers, the TSD will send those two metrics out, likely to two servers, and they can be processed in parallel instead of having both in the HBase queue on the same server and having them handled when a thread is available.

Additionally with 2.2, you can enable salting for the row key so it's now . This helps queries for high-cardinality metrics (eg lots of hosts) by splitting each metric query across region servers in parallel.

This schema is efficient for time range queries for a given set of series (metric+tags). Any other query, such as get last values for all metrics collected by a given server would, requires a full scan.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM