简体   繁体   English

ELK堆栈用于存储计量数据

[英]ELK stack for storing metering data

In our project we're using an ELK stack for storing logs in a centralized place. 在我们的项目中,我们使用ELK堆栈将日志存储在一个集中的位置。 However I've noticed that recent versions of ElasticSearch support various aggregations. 但是我注意到最近版本的ElasticSearch支持各种聚合。 In addition Kibana 4 supports nice graphical ways to build graphs. 此外,Kibana 4支持很好的图形方式来构建图形。 Even recent versions of grafana can now work with Elastic Search 2 datasource. 即使最近版本的grafana现在也可以使用Elastic Search 2数据源。

So, does all this mean that ELK stack can now be used for storing metering information gathered inside the system or it still cannot be considered as a serious competitor to existing solutions: graphite, influx db and so forth. 因此,这一切是否意味着ELK堆栈现在可以用于存储系统内收集的计量信息,或者它仍然不能被视为现有解决方案的重要竞争对手:石墨,流入数据库等等。 If so, does anyone use ELK for metering in production? 如果是这样,有没有人使用ELK进行生产计量? Could you please share your experience? 你能分享一下你的经历吗?

Just to clarify the notions, I consider metering data as something that can be aggregated and and show in a graph 'over time' as opposed to regular log message where the main use case is searching. 为了澄清这些概念,我认为将数据计量为可以聚合的东西,并且在“随着时间的推移”中显示在图形中,而不是主要用例正在搜索的常规日志消息。

Thanks a lot in advance 非常感谢提前

Yes you can use Elasticsearch to store and analyze time-series data. 是的,您可以使用Elasticsearch来存储和分析时间序列数据。

To be more precise - it depends on your use case . 更确切地说 - 这取决于您的使用案例 For example in my use case (financial instrument price tick history data, in development ) I am able to get 40.000 documents inserted / sec ( ~125 byte documents with 11 fields each - 1 timestamp, strings and decimals, meaning 5MB/s of useful data ) for 14 hrs/day , on a single node (big modern server with 192GB ram) backed by corporate SAN (which is backed by spinning disks , not SSD!). 例如我的用例 (金融工具价格刻度历史数据, 在开发中 )我能够获得40.000个文件插入/秒~125个字节的文件 ,每个11个字段 - 1个时间戳,字符串和小数,意味着5MB / s的有用数据14小时/天 ,在单个节点 (大型现代服务器,192GB内存)上由企业SAN 支持(由旋转磁盘支持 ,而不是SSD!)。 I went to store up to 1TB of data , but I predict having 2-4TB could also work on a single node. 我去存储高达1TB的数据 ,但我预测2-4TB也可以在单个节点上运行。

All this is with default config file settings , except for the ES_HEAP_SIZE of 30GB. 所有这些都是使用默认配置文件设置 ,但ES_HEAP_SIZE为30GB除外。 I am suspecting it would be possible to get significantly better write performance on that hardware with some tuning (eg. I find it strange that iostat reports device util at 25-30% as if Elastic was capping it / conserving i/o bandwith for reads or merges... but it could also be that the %util is an unrealiable metric for SAN devices..) 我怀疑通过一些调整可以在该硬件上获得明显更好的写入性能(例如,我发现很奇怪iostat以25-30%报告设备使用率,好像Elastic正在限制它/保留i / o带宽用于读取或合并......但也可能是%util是SAN设备不可行的指标..)

Query performance is also fine - queries / Kibana graphs return quick as long as you restrict the result dataset with time and/or other fields. 查询性能也很好 - 只要您使用时间和/或其他字段限制结果数据集,查询/ Kibana图表就会快速返回。

In this case you would not be using Logstash to load your data , but bulk inserts of big batches directly into the Elasticsearch. 在这种情况下,您不会使用Logstash加载数据 ,而是将大批量的批量插入直接插入Elasticsearch。 https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html

You also need to define a mapping https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping.html to make sure elastic parses your data as you want it (numbers, dates, etc..) creates the wanted level of indexing, etc.. 您还需要定义映射 https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping.html以确保弹性分析您想要的数据(数字,日期等)。创建所需的索引级别等。

Other recommended practices for this use case are to use a separate index for each day (or month/week depending on your insert rate), and make sure that index is created with just enough shards to hold 1 day of data (by default new indexes get created with 5 shards, and performance of shards starts degrading after a shard grows over a certain size - usually few tens of GB, but it might differ for your use case - you need to measure/experiment). 对于这种使用情况的其他建议的做法是用每一天的单独的索引 (或月/周取决于你插入率),并确保指数只有足够的碎片创建以保存1天的数据中 (默认情况下新指标使用5个分片创建,并且分片的性能在​​分片增长超过一定大小后开始降级 - 通常为几十GB,但对于您的用例可能会有所不同 - 您需要进行测量/实验)。

Using Elasticsearch aliases https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-aliases.html helps with dealing with multiple indexes, and is a generally recommended best practice. 使用Elasticsearch 别名 https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-aliases.html有助于处理多个索引,并且是一般建议的最佳做法。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM