I'm trying to find the best components I could use to build something similar to Splunk in order to aggregate logs from a big number of servers in computing grid. Also it should be distributed because I have gigs of logs everyday and no single machine will be able to store logs.
I'm particularly interested in something that will work with Ruby and will work on Windows and latest Solaris (yeah, I got a zoo).
I see architecture as:
Log crawler and distributed search engine are out of questions - logs will be parsed by Ruby script and ElasticSearch will be used to index log messages. Front end is also very easy to choose - Sinatra.
My main problem is distributed log storage. I looked at MongoDB, CouchDB, HDFS, Cassandra and HBase.
So I'm stuck. Something tells me HDFS or HBase are the best thing to use as a log storage, but HDFS only works smoothly with Java and HBase is just a deploying/monitoring nightmare.
Can anyone share its thoughts or experience building similar systems using components I described above or with something completely different?
I'd recommend using Flume to aggregate your data into HBase . You could also use the Elastic Search Sink for Flume to keep a search index up to date in real time.
For more, see my answer to a similar question on Quora .
关于Java和HDFS-使用BeanShell之类的工具,您可以通过Javascript与HDFS存储进行交互。
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.