简体   繁体   English

最好的设计方法是更快的搜索?

[英]What could be the best design approach faster search?

I am working on a architecture design of a application using PHP Yii which has large records(Around a million in future). 我正在使用具有大量记录(未来大约一百万)的PHP Yii进行应用程序的体系结构设计。 The DB struct is as below: 数据库结构如下:

在此处输入图片说明

Requirement: 需求:

  1. Fast Keyword Search for Profiles,Articles,Forums. 快速关键词搜索个人资料,文章,论坛。 Keyword can be combination of columns-eg BizName+City,City+Speciality,ServiceName+City,Article Title etc. 关键字可以是列的组合,例如BizName + City,City + Speciality,ServiceName + City,Article Title等。
  2. Keyword suggestion to user 给用户的关键字建议
  3. Show search results in tabs. 在标签中显示搜索结果。 Example: Profiles,Articles.Forums etc. 例如:个人资料,文章,论坛等。

Approach 1: 方法1:

  1. Have a relational DB.Write SQLs on multiple columns using OR and pattern matching. 使用OR和模式匹配在多个列上具有关系DB.Write SQL。

Cons: 缺点:

Poor performance 表现不佳

Aprroach 2: 途径2:

  1. Create a Keyword table.Create the combination of columns which are searchable and save them in KeywrodTab. 创建一个关键字表,创建可搜索列的组合并将其保存在KeywrodTab中。
  2. Create mapping tables of -keyword-Profile.Keyword-Article,Keyword-Forum etc. 创建-keyword-Profile.Keyword-Article,Keyword-Forum等的映射表。
  3. Query keyword table for autosuggestions. 查询关键字表以获取自动建议。 once user hits search button query mapping tables and extract articleId,ProfileId,ForumId etc. 一旦用户点击搜索按钮查询映射表并提取articleId,ProfileId,ForumId等。

Cons: 缺点:

Creating/Updating keywords and mapping on every update. 创建/更新关键字,并在每次更新时进行映射。

Approach 3: 方法3:

  1. Have a relational DB with FULLTEXT indices on searchable columns. 在可搜索列上有一个带有FULLTEXT索引的关系数据库。

Questions: 问题:

  1. Not sure if auto suggest for search box will work or not? 不确定自动建议搜索框是否有效?
  2. How will be the performance in this case as compared other approaches? 与其他方法相比,这种情况下的性能如何?

Approach 4: 方法4:

Use NoSQL DB like MongoDB/Solr/Lucene in combination with RelationalDB.Use noSQL for finding the articleId,ProfileId,ForumId etc.And relational DB for displaying results. 将NoSQL数据库(如MongoDB / Solr / Lucene)与RelationalDB结合使用;使用noSQL查找articleId,ProfileId,ForumId等;以及使用关系数据库显示结果。

Cons: 缺点:

  1. Creating/Updating noSQL on every update. 在每次更新时创建/更新noSQL。

Any other approaches please? 还有其他方法吗? Which approach is scalable and will give good performance? 哪种方法是可扩展的,并且将提供良好的性能?

If you put it like that, approach 4 is the most scalable and has the best performance hands down. 如果这样说,方法4是最具可扩展性的,并且性能最佳。

However, as it's not clear what the content will actually be and how large the dataset will be - 'around a million rows' is hardly an indication, as it doesn't say what the rows contain and if those rows are in a single table or not - it's actually not possible to give accurate advice. 但是,由于尚不清楚内容的实际含义和数据集的大小,因此“一百万行左右”几乎不能表示,因为它没有说明行包含什么以及这些行是否在单个表中是否可以-实际上不可能给出准确的建议。 Approach 4 may be the fastest anyway, but is it the most efficient? 无论如何,方法4可能是最快的,但是效率最高吗? A million rows in a single table with about 4 columns, each containing about 250 bytes of data (just a guess here, your miles may vary), is actually not all that much. 一个表中大约有4列的一百万行,每行包含约250字节的数据(只是在这里猜测,您的里程可能会有所不同),实际上并没有那么多。 Choose the indexes well and optimize the queries, and a RDBMS may be all you need. 很好地选择索引并优化查询,您可能只需要RDBMS。

My suggestion is: build up a dataset to test with and try the various approaches. 我的建议是:建立一个数据集以进行测试并尝试各种方法。

When you want to search quickly by multiple columns in multiple tables in an SQL database, you would need to place indexes on almost everything. 当您想通过SQL数据库的多个表中的多个列快速搜索时,您将需要在几乎所有内容上放置索引。 That's a good way to get the write-performance of your relational database to record-lows. 这是使关系数据库的写入性能达到记录低点的好方法。

For that reason I would recommend you to use an independent system for searching. 因此,我建议您使用独立的系统进行搜索。 From the technologies you mentioned I would rather recommend the dedicated search server Apache Solr (which is part of the Lucene project, not a separate technology) than MongoDB, because MongoDB is an interesting database technology a lots of great features, but its text search is not a core feature and rather tagged-on. 从您提到的技术中,我宁愿推荐专用的搜索服务器Apache Solr(这是Lucene项目的一部分,而不是单独的技术),而不是MongoDB,因为MongoDB是一项有趣的数据库技术,具有许多强大的功能,但是其文本搜索是不是核心功能,而是带有标记。

But technology-choices are always subjective, so evaluate all the options, see how they line up with your specific requirements and make your own decision. 但是技术选择始终是主观的,因此请评估所有选项,了解它们如何符合您的特定要求并做出自己的决定。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM