简体   繁体   English

为短期流命名statsd指标

[英]Naming statsd metrics for short lived streams

I am trying to model statistics to submit to statsd/graphite. 我正在尝试对统计模型进行建模以提交给statsd / graphite。 However what I am monitoring is "session" centric. 但是,我监视的是以“会话”为中心的。 For example, I have a game that is played in real time. 例如,我有一个实时玩的游戏。 There are multiple instances of a game active on the servers. 服务器上有多个活动的游戏实例。 Each game has multiple (and variable number of) participants. 每个游戏都有多个(且数量可变)参与者。 Each instance of a game has a unique ID as does each player. 每个游戏实例和每个玩家都有唯一的ID。 I want to track (and graph) each player's stats but then roll the metric up for the whole instance and then for all the instances of a game. 我想跟踪(并绘制图形)每个玩家的统计信息,但随后针对整个实例以及游戏的所有实例汇总指标。 For example there may be two instances of a game active at a given time. 例如,在给定时间可能有两个活动的游戏实例。 Lets say each has two players in the game 可以说每个人都有两名玩家

GameTitle.RealTime.VoiceErrors.game_instance_a.player_id_1 10
GameTitle.RealTime.VoiceErrors.game_instance_a.player_id_2 20
GameTitle.RealTime.VoiceErrors.game_instance_b.player_id_3 50
GameTitle.RealTime.VoiceErrors.game_instance_b.player_id_4 70

where game_instances and player_ids are 128 bit numbers 其中game_instances和player_ids是128位数字

And I want to be able to see that the value of all voice errors for game_instance_a is 30 while all voice errors across the system is 150 而且我希望能够看到game_instance_a的所有语音错误的值为30,而系统中的所有语音错误的值为150

Given this I have three questions 鉴于此,我有三个问题

  1. What guidance would you have on naming the metrics. 您将在命名指标方面获得什么指导。
  2. Is it kosher to have metrics that have "dynamic" identifiers as part of the name 名称带有“动态”标识符的指标是否符合犹太标准
  3. What are they scale limits on this. 他们对此有何规模限制。 If I had a 100K game instances with say as many as 1000 players in a game, is this going to kill statsd/graphite? 如果我有一个100K游戏实例,每个游戏中有多达1000个玩家,那么这会杀死statsd / graphite吗?

Thanks! 谢谢!

What guidance would you give on naming the metrics? 您将在命名指标方面提供什么指导?

Graphite recommends that "Volatile path components should be kept as deep into the hierarchy as possible" . Graphite建议“易变路径组件应尽可能深入层次结构” This essentially means that if you can push the parts of the metrics that are frequently unique to the end of the "bucket" without impacting your grouping queries you should try to do so. 从本质上讲,这意味着如果您可以将指标中通常是唯一的部分推送到“存储桶”的末尾,而又不影响您的分组查询,则应尝试这样做。

Here is a great post on using Graphite that includes naming recommendations. 这是有关使用Graphite的精彩文章 ,其中包括命名建议。 And here is another one with additional info from Jason Dixon (an excellent source for Graphite stuff in general). 这是Jason Dixon 提供的其他信息 (一般来说,这是石墨材料的绝佳来源)。

Is it kosher to have metrics that have "dynamic" identifiers as part of the name? 名称带有“动态”标识符的指标是否符合犹太标准?

I usually try to avoid identifiers in the metric names unless they are very low in number (<100). 我通常尝试避免在度量标准名称中使用标识符,除非它们的数量非常少(<100)。 Because Graphite will store a .wsp file for every metric name you'll have a difficult time re-sizing or adjusting the storage settings should you decide to change your configuration. 由于Graphite将为每个指标名称存储一个.wsp文件,因此如果您决定更改配置,则将很难调整大小或调整存储设置。 Additionally, the Graphite UI will have a "folder" for every metric name so you can easily make the UI unusable. 此外,Graphite UI的每个度量标准名称都将有一个“文件夹”,因此您可以轻松地使该UI不可用。

In your case, I'd probably graph the total number of game instances, the total number of players, and the number of errors (by type), etc. Additionally, I might try to track players per instance (generally) and maybe errors per instance (again without knowing the actual instance. eg GameTitle.RealTime.PerInstance.VoiceErrors) if I had that capability (ie state stored per instance in my application). 在您的情况下,我可能会绘制游戏实例的总数,玩家的总数以及错误的数量(按类型)等。此外,我可能会尝试跟踪每个实例的玩家(通常)以及错误每个实例(同样又不知道实际实例,例如GameTitle.RealTime.PerInstance.VoiceErrors)是否具有该功能(即,每个实例在应用程序中存储的状态)。

Logstash, Elastic Search, Kibana Logstash,弹性搜索,Kibana

I'd suggest logging this error information with instance and player ids and using logstash to send your logs to elastic search and kibana . 我建议使用实例和播放器ID记录此错误信息,并使用logstash将您的日志发送到弹性搜索和kibana Then I'd watch Graphite for real time error and health anomaly detection and use Kibana (and Elastic Search underneath) to dig deeper. 然后,我将观察Graphite的实时错误和运行状况异常检测,并使用Kibana(及其下的Elastic Search)进行更深入的研究。

What are the scale limits on this. 对此有什么规模限制。 If I had a 100K game instances with say as many as 1000 players in a game, is this going to kill statsd/graphite? 如果我有一个100K游戏实例,每个游戏中有多达1000个玩家,那么这会杀死statsd / graphite吗?

Statsd should have no problem with this, as it just acts as a -mostly- dumb aggregator. Statsd应该没有任何问题,因为它只是一个愚蠢的聚合器。 While it does maintain some state internally I don't anticipate a problem. 虽然它确实在内部保持某些状态,但我预计不会出现问题。

I don't think you'll have problems with the internal Graphite Whisper Storage itself, as it is just using files and folders. 我认为内部Graphite Whisper Storage本身不会有问题,因为它仅使用文件和文件夹。 But, as I mentioned above, the Graphite Web UI will be unusable and I think you'll also run the risk of other manageability issues. 但是,正如我上面提到的,Graphite Web UI将无法使用,我认为您还将冒其他可管理性问题的风险。

Summary 摘要

Keep the volatile (dynamic) metric buckets at the end of the name and avoid going above a couple hundred of these. 将易失(动态)度量标准存储区保留在名称的末尾,并避免超过其中的数百个。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM