简体   繁体   中英

alternatives for aggregating a lot of data fast

i'm using InfiniDB to aggregate a lot of rows (about 100-500 million) down to about less than 5000 groups. (in most querys the 100-500 million rows are filtered, so the aggregation will work on less rows)

It is used as a prototype of a travel-search-engine for a website, and you can think about it as "give me the best price per accommodation for all combinations of rooms for a specific number of persons".

It's working fine, until i have to self-join the table several times, to find the best-price combination (it's already reduced with logical filters, so the number of combinations per join are reduced too)

it is possible for me to split the content of the table in different tables, and it is working with acceptable performance, but now i'm asking myself if infinidb (or column oriented databases in general) is the best solution for this problem.

What are alternatives? i think every map/reduce mechanism (mongodb, hadoop) will be much slower, or is there a point i miss about it?

it should not require more than 2-5 server.

to make it clear: i don't expect a "this would be pefect!" answer, but good hints for alternatives. i also think that infinidb is a bad solution for my scenario.

Thanks for thoughts!

I used infinidb 3 scaled on 9 machines with tables having > 30 billion rows without any problems, even with self-joins.

Give-me an example ddl + dql. Maybe I can help you out to improve the query.

Before Infinidb we tried hbase / cassandra / mongodb and the technology and we didn't like-it. For 500 million rows you can use simple Mysql if you need to do this not more than 2-3 times per day.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM