database with automatic horizontal scaling out of the box

I seek for DBMS which:

  1. scales horizontally out of the box with no or few hand-written "glue"
  2. allows querying records by any one of few indexes
  3. is easy to maintain and scale (ie we just add new server and DBMS redistributes data by its own)

The goal is to redesign and ultimately migrate from current solution (Oracle RAC based). The problem with the old solution is it's poor design and code quality, not Oracle by itself.

About our data. We have two types of records: nodes and events. Both are added to db and never deleted. There are about 2e9 nodes and 5e11 events. Every event is bound to a single node. Queries that we need are:

  1. query nodes by their few properties n1, n2, n3
  2. query nodes by node_id
  3. query events by time interval and their main property e1
  4. query events by node_id

And of course we need to insert new nodes and events. The number of queries 1-4 is a few thousands a day, and will not grow very much, but all data need to be accessible. The number of new events per day is roughly equal to the number of nodes. The number of new nodes per day is few hundreds at max.

We do not need transactions or joins for consistency, as data is always consistent after insertion and never deleted. We could have implemented it using separate postgres servers (and manually dispatching queries), but is there a better way? We would consider any (SQL or NoSQL) open-source database suitable for our task. We are also not bound to any particular language. The priority is ease of scaling while sustaining decent query speed.

You may want to have a look at Riak . It's a KVS known for reliability, ease of scaling, and being highly available for both reads and writes. It happens to be written in erlang, but you don't need to know anything about erlang in order to use it. You can speak to it via http and protocol buffers.

Because it's a KVS, you can solve (2) and (4) just buy storing your data by id. In order to get (1) and (3) you'd probably need to use secondary indexes (aka 2i). You didn't mention the performance requirements of (1) and (3), but my understanding is that 2i is much slower than a regular read so that's probably where you need to focus your performance testing.

Anyway, have a look at the use cases and see if riak could be a possible fit. Additionally, there are a lot of great stories about riak in production on their vimeo channel .

