简体   繁体   中英

Which Database engine for large dataset

I'm working on a analysis assignment, we got a partial data-set from the university Library containing almost 300.000.000 rows.

Each row contains:

  • ID
  • Date
  • Owner
  • Deadline
  • Checkout_date
  • Checkin_date

I put all this inside a MySQL table, then I started querying that for my analysis assignment, however simple query ( SELECT * FROM table WHERE ID = something ) where taking 9-10 minutes to complete. So I created an index for all the columns, which made it noticeable faster ~ 30 sec.

So I started reading similar issues, and people recommended switching to a "Wide column store" or "Search engine" instead of "Relational".

So my question is, what would be the best database engine to use for this data?

Using a search engine to search is IMO the best option.

Elasticsearch of course!

Disclaimer: I work at elastic. :)

The answer is, of course, "it depends". In your example, you're counting the number of records in the database with a given ID. I find it hard to believe that it would take 30 seconds in MySQL, unless you're on some sluggish laptop.

MySQL has powered an incredible number of systems because it is full-featured, stable, and has pretty good performance. It's bad (or has been bad) at some things, like text search, clustering, etc.

Systems like Elasticsearch are good with globs of text, but still may not be a good fit for your system, depending on usage. From your schema, you have one text field ("owner"), and you wouldn't need Elasticsearch's text searching capabilities on a field like that (who ever needed to stem a user name?). Elasticsearch is also used widely for log files, which also don't need a text engine. It is, however, good with blocks of text and with with clustering.

If this is a class assignment, I'd stick with MySQL.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM