简体繁体中英

A network fault-tolerant architecture for ELK Stack

原文 2016-12-31 08:25:34 8 1 architecture/ elastic-stack

It's just only a few days I'm acquainted with ELK Stack . We're trying to use it in our enterprise applications but have some architectural concerns. I've seen & read some use cases of ELK and their architectures, especially in linkedin , but no one have discussed about the potential effect of the network errors on his/her architecture.

In traditional applications, which usually logs are written into the files, the only reason that can cause to system crash is Disk is Full error that is really rare. But in a centralized log system, which the logs are sent via network, since the network errors are very common I think the system is highly crash-prone!! Especially in/for the corps with unreliable network.

Furthermore, as I've seen in many ELK use cases, a single instance of a JMS Provider or in other words a Pub/Sub Provider like Kafka or Redis is used along with the ELK . I think in addition to the previous problem, the JMS Provider is a single point of failure in these architectures! Unless, that would be clustered.

I think we can get rid of the both problems if we use a JMS Provider like Kafka alongside each Shipper[s] on a single node as follows (one Kafka for each node):

((log-generator)+ (logstash)? Kafka)* -> Logstash -> Elasticsearch -> Kibana

Please, let me know if this architecture makes sense?
If it doesn't makes, any other fault tolerant architecture will be welcome :)

1 answers

The answer depends on how much risk is allowed, where you might expect to encounter such risk, and how long you expect an incident to last.

If you write to local files, you can use Filebeat to ship the files to a remote logstash. If that logstash (or the downstream Elasticsearch cluster) applies back-pressure, filebeat will slow down or stop sending logs. This provides you with a distributed cache on the remote machines (no broker required). The downside is that, if the outage is long-lasting, the log file might be rotated out from under filebeat's glob pattern, and then it will never ship.

With multiple logstash instances, you can configure filebeat to ship to a list of them, thus providing some survivability. If you have "one-time" events (like snmptraps, syslog, etc), you'll want to think about the possible outages a little more.

I used to run a separate logstash instance for these types of events, which would feed into redis. The main logstash (when up) would then read from the queue and process the events. This allowed me to launch a new logstash config without fear of losing events. These days, I try to write events to files (with snmptrapd, etc), and not depend on any logstash running 24x7x365.

How do you design the architecture of an Erlang/OTP-based distributed fault-tolerant multicore system?

Fault tolerant software architecture

Building a fault-tolerant soft real-time web application with Erlang/OTP

Single fault tolerant machine with amazon AWS

Creating a scalable and fault tolerant system using AWS ECS

Best design with fault tolerant for retrieving updates periodically invoking an API

Stack architecture flutter

Call stack memory architecture

Android Architecture Components network threads

Designing a recurrent neural network architecture

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question How do you design the architecture of an Erlang/OTP-based distributed fault-tolerant multicore system? Fault tolerant software architecture Building a fault-tolerant soft real-time web application with Erlang/OTP Single fault tolerant machine with amazon AWS Creating a scalable and fault tolerant system using AWS ECS Best design with fault tolerant for retrieving updates periodically invoking an API Stack architecture flutter Call stack memory architecture Android Architecture Components network threads Designing a recurrent neural network architecture

Related Tags

A network fault-tolerant architecture for ELK Stack

Question

1 answers

solution1 1 ACCPTED 2017-01-01 00:57:13

solution1
1 ACCPTED 2017-01-01 00:57:13