简体   繁体   中英

Apache Drill for a Homogeneous Data Store

Just beginning to explore apache drill for as a data engine for a reporting app.

We're a PostGres shop as our transactional data is all in RDBMS.

Moving to any NoSQL (MongoDB) is a distant dream for us and there's no pressing need for us to spend money on that as of today.

Our data size is big (but still all in PostGres). We have a few tables spanning upto a few lower hundreds of millions (say 150M).

Performance is a key for us. We want our reports to be generated as fast as possible to the end user real time.

I have a basic question here for my use case:

If the time-cost of a native (direct) postgres query is say: P By going through drill, I would imagine the cost is going to be: P + D, where D is the extra cost of Drill?

At the end of the day, if Postgres proves to be a bottleneck (say missing indices etc), then Drill can't help in making the situation better right, no matter how many ever Drill bits I horizontally add?

So, in what way using Drill for my use case help than optimise PostGres and querying it directly?

Apache Drill is usually being used to consolidate access and being able to join over different database systems, eg a PostgreSQL and a MongoDB.

Here my first question would be why change a working and proven database system which is in the newer versions is fully capable of handling JSON data? What is the main success factor which is being seen which opens the wish to move to MongoDB?

If you have only one database system, I'd concentrate in getting the most performance out of that. If using Apache Drill to consolidate different systems, you'd have to remember a few facts designing the drill layer:

  • You need Zookeeper nodes for Drill if you setup several drillbits
  • You need a few drillbit servers which do have compute power and big memory
  • You need to make sure to understand how Drill uses the underlying databases when queries are being sent: Drill tries to use the most power of the database systems to minimize any processing it needs to do (eg joins, like statemens happen in the database system). Because of that the underlying database infrastructure has to be powerful

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM