简体   繁体   中英

Bulding search engine for large database

I'm building a fairly large database where I will have a lot of tables with various data.

But each table has similar fields, for example video title or track title.

Now the problem I'm facing is how to build a query which would look for a keyword match across five or more tables, keep in mind that each table can potentially have from 100k to 1million rows or in some cases even couple million rows.

I think using joins or separate queries for each table would be very slow, so what I thought of is to make one separate table where I would store search data.

For example I think it could have fields like these,

id ---- username ---- title ---- body ---- date ---- belongs_to ---- post_id

This way I think it would perform a lot faster searches, or am I totally wrong?

The only problem with this approach that I can think of it is that it would be hard to manage this table because if original record from some of the tables is deleted I would also need to delete record from 'search' table as well.

不要使用MySQL连接大量表,建议您使用RDBMS来查看Apache Solr

Take a look at some information retrieval systems. They also require their own indices, so you need to index the data after each update (or in regular intervals) to keep the search index up to date. But they offer the following advantages:

  • much faster, because they use special algorithms and data structures designed for specifically that purpose
  • ability to search for documents based on a set of terms (and maybe also a set of negative terms that must not appear in the result)
  • search for phrases (ie terms that appear after each other in a specific order)
  • automatic stemming (ie stripping the endings of words like "s", "ed", "ing" ...)
  • detection of spelling mistakes (ie "Did you mean ...?")
  • stopwords to avoid indexing really common meaningless words ("a", "the", etc.)
  • wildcard queries
  • advanced ranking strategies (ie rank by relevance, based on the number and the position of each occurrences of the search terms)

I have used xapian in the past for my projects and I was quite happy with it. Lucene , Solr and elastic search are some other really popular projects that might fit your needs.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM