简体   繁体   中英

Cassandra (Data Replication From Database For BI)

We have multiple database which we query and generate report. Since we have to create complex queries and do lot of joins etc, Is it a good Idea if we use Cassandra or Hadoop or Elasticsearch to load data (daily jobs to load data or incremental updates) and query this database for all the task.

Which would be preferred choice Cassandra or Hadoop or Elasticsearch or MongoDB ?

We also want to build a Web UI for reporting and analytics on the consolidated database.

I cannot recommend MongoDB. It's a subpar in terms of big data analysing, its Map-Reduce implementation is poor, Map-Reduce is slow and single-threaded. Cassandra + Hadoop or HDFS + Hadoop is your choice. In case of Hadoop you are not limited with storage type, you can flush (or store initially) your data in HDFS and iterate it with MapReduce.

If you need a durability look at the Cassandra. First, Cassandra is very easy in maintenance and very reliable. I believe Cassandra is the most reliable noSQL db in the world. It's absolutely horizontally scallable, no name nodes, no master/slaves, all nodes a leveled in rights.

With Elasticsearch you can do only search. If you have a lot of data and you needed an analytics you should look towards Hadoop and MapReduce.

With Hadoop you can to start using Hive or Pig - the most powerfull map-reduce abstractions I've ever seen. With Hadoop you can even start thinking about migration to Spark/Shark.

Cassandra would be a best if your choice is limited to those three as writing joins in MapReduce programs involves lot of efforts with multiple and chaining of MapReduce programs to get one join correctly. If your options are open, Apache Hive can be leveraged to non interactive or reporting applications as it supports quite number of SQL functions such as joins, group by, order by etc. Apache Hive is again supports SQL like queries and there wouldn't be much different from the traditional SQLs.

You could also consider Apache Drill , Hortonworks Stinger and Cloudera Impala for interactive reporting applications.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM