简体繁体 English

ELK堆栈用于存储计量数据

[英]ELK stack for storing metering data

原文 2016-05-09 22:17:51 6 1 java/ elasticsearch/ elastic-stack/ metering

In our project we're using an ELK stack for storing logs in a centralized place. 在我们的项目中，我们使用ELK堆栈将日志存储在一个集中的位置。 However I've noticed that recent versions of ElasticSearch support various aggregations. 但是我注意到最近版本的ElasticSearch支持各种聚合。 In addition Kibana 4 supports nice graphical ways to build graphs. 此外，Kibana 4支持很好的图形方式来构建图形。 Even recent versions of grafana can now work with Elastic Search 2 datasource. 即使最近版本的grafana现在也可以使用Elastic Search 2数据源。

So, does all this mean that ELK stack can now be used for storing metering information gathered inside the system or it still cannot be considered as a serious competitor to existing solutions: graphite, influx db and so forth. 因此，这一切是否意味着ELK堆栈现在可以用于存储系统内收集的计量信息，或者它仍然不能被视为现有解决方案的重要竞争对手：石墨，流入数据库等等。 If so, does anyone use ELK for metering in production? 如果是这样，有没有人使用ELK进行生产计量？ Could you please share your experience? 你能分享一下你的经历吗？

Just to clarify the notions, I consider metering data as something that can be aggregated and and show in a graph 'over time' as opposed to regular log message where the main use case is searching. 为了澄清这些概念，我认为将数据计量为可以聚合的东西，并且在“随着时间的推移”中显示在图形中，而不是主要用例正在搜索的常规日志消息。

Thanks a lot in advance 非常感谢提前

1 个解决方案

Yes you can use Elasticsearch to store and analyze time-series data. 是的，您可以使用Elasticsearch来存储和分析时间序列数据。

To be more precise - it depends on your use case . 更确切地说 - 这取决于您的使用案例 。 For example in my use case (financial instrument price tick history data, in development ) I am able to get 40.000 documents inserted / sec ( ~125 byte documents with 11 fields each - 1 timestamp, strings and decimals, meaning 5MB/s of useful data ) for 14 hrs/day , on a single node (big modern server with 192GB ram) backed by corporate SAN (which is backed by spinning disks , not SSD!). 例如在我的用例 （金融工具价格刻度历史数据， 在开发中 ）我能够获得40.000个文件插入/秒 （ ~125个字节的文件 ，每个11个字段 - 1个时间戳，字符串和小数，意味着5MB / s的有用数据 ） 14小时/天 ，在单个节点 （大型现代服务器，192GB内存）上由企业SAN 支持（由旋转磁盘支持 ，而不是SSD！）。 I went to store up to 1TB of data , but I predict having 2-4TB could also work on a single node. 我去存储高达1TB的数据 ，但我预测2-4TB也可以在单个节点上运行。

All this is with default config file settings , except for the ES_HEAP_SIZE of 30GB. 所有这些都是使用默认配置文件设置 ，但ES_HEAP_SIZE为30GB除外。 I am suspecting it would be possible to get significantly better write performance on that hardware with some tuning (eg. I find it strange that iostat reports device util at 25-30% as if Elastic was capping it / conserving i/o bandwith for reads or merges... but it could also be that the %util is an unrealiable metric for SAN devices..) 我怀疑通过一些调整可以在该硬件上获得明显更好的写入性能（例如，我发现很奇怪iostat以25-30％报告设备使用率，好像Elastic正在限制它/保留i / o带宽用于读取或合并......但也可能是％util是SAN设备不可行的指标..）

Query performance is also fine - queries / Kibana graphs return quick as long as you restrict the result dataset with time and/or other fields. 查询性能也很好 - 只要您使用时间和/或其他字段限制结果数据集，查询/ Kibana图表就会快速返回。

In this case you would not be using Logstash to load your data , but bulk inserts of big batches directly into the Elasticsearch. 在这种情况下，您不会使用Logstash加载数据 ，而是将大批量的批量插入直接插入Elasticsearch。 https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html

You also need to define a mapping https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping.html to make sure elastic parses your data as you want it (numbers, dates, etc..) creates the wanted level of indexing, etc.. 您还需要定义映射 https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping.html以确保弹性分析您想要的数据（数字，日期等）。创建所需的索引级别等。

Other recommended practices for this use case are to use a separate index for each day (or month/week depending on your insert rate), and make sure that index is created with just enough shards to hold 1 day of data (by default new indexes get created with 5 shards, and performance of shards starts degrading after a shard grows over a certain size - usually few tens of GB, but it might differ for your use case - you need to measure/experiment). 对于这种使用情况的其他建议的做法是用每一天的单独的索引 （或月/周取决于你插入率），并确保指数只有足够的碎片创建以保存1天的数据中 （默认情况下新指标使用5个分片创建，并且分片的性能在分片增长超过一定大小后开始降级 - 通常为几十GB，但对于您的用例可能会有所不同 - 您需要进行测量/实验）。

Using Elasticsearch aliases https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-aliases.html helps with dealing with multiple indexes, and is a generally recommended best practice. 使用Elasticsearch 别名 https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-aliases.html有助于处理多个索引，并且是一般建议的最佳做法。