简体繁体中英

What is the best components stack for building distributed log aggregator (like Splunk)?

原文 2010-06-22 18:42:36 3 2 ruby/ logging/ hbase/ hdfs/ splunk

I'm trying to find the best components I could use to build something similar to Splunk in order to aggregate logs from a big number of servers in computing grid. Also it should be distributed because I have gigs of logs everyday and no single machine will be able to store logs.

I'm particularly interested in something that will work with Ruby and will work on Windows and latest Solaris (yeah, I got a zoo).

I see architecture as:

Log crawler (Ruby script).
Distributed log storage.
Distributed search engine.
Lightweight front end.

Log crawler and distributed search engine are out of questions - logs will be parsed by Ruby script and ElasticSearch will be used to index log messages. Front end is also very easy to choose - Sinatra.

My main problem is distributed log storage. I looked at MongoDB, CouchDB, HDFS, Cassandra and HBase.

MongoDB was rejected because it doesn't work on Solaris.
CouchDB doesn't support sharding (smartproxy is required to make it work but this is something I don't want to even try).
Cassandra works great but it's just a disk space hog and it requires running autobalance everyday to spread the load between Cassandra nodes.
HDFS looked promising but FileSystem API is Java only and JRuby was a pain.
HBase looked like a best solution around but deploying it and monitoring is just a disaster - in order to start HBase I need to start HDFS first, check that it started without problems, then start HBase and check it also, and then start REST service and also check it.

So I'm stuck. Something tells me HDFS or HBase are the best thing to use as a log storage, but HDFS only works smoothly with Java and HBase is just a deploying/monitoring nightmare.

Can anyone share its thoughts or experience building similar systems using components I described above or with something completely different?

2 answers

I'd recommend using Flume to aggregate your data into HBase . You could also use the Elastic Search Sink for Flume to keep a search index up to date in real time.

For more, see my answer to a similar question on Quora .

关于Java和HDFS-使用BeanShell之类的工具，您可以通过Javascript与HDFS存储进行交互。

What does Ruby's execution stack look like?

What is the best way to match space-like chars?

What is the best way to parse a YAML-like string in Ruby?

Building a Twitter like feed in rails

In Ruby, what is stored on the stack?

What technology stack to choose

What's the best way to schedule and execute repetitive tasks (like scraping a page for information) in Rails?

log partial exception stack trace in Ruby

Building a form for a like button in a polymorphic relationship in Rails

Building a Ruby stack on Windows Server: msvcrt-ruby18.dll

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question What does Ruby's execution stack look like? What is the best way to match space-like chars? What is the best way to parse a YAML-like string in Ruby? Building a Twitter like feed in rails In Ruby, what is stored on the stack? What technology stack to choose What's the best way to schedule and execute repetitive tasks (like scraping a page for information) in Rails? log partial exception stack trace in Ruby Building a form for a like button in a polymorphic relationship in Rails Building a Ruby stack on Windows Server: msvcrt-ruby18.dll

Related Tags

What is the best components stack for building distributed log aggregator (like Splunk)?

Question

2 answers

solution1
2 ACCPTED 2010-10-15 10:51:00

solution2
0 2010-06-22 19:46:52

What is the best components stack for building distributed log aggregator (like Splunk)?

Question

2 answers

solution1 2 ACCPTED 2010-10-15 10:51:00

solution2 0 2010-06-22 19:46:52

solution1
2 ACCPTED 2010-10-15 10:51:00

solution2
0 2010-06-22 19:46:52