简体   繁体   English

简单的搜索功能,带有用于MySQL数据库的php中的固定标签

[英]Simple search function with fixed tags in php for mysql database

I'm about to build a simple search function in php. 我将在php中建立一个简单的搜索功能。 Most important is speed, I want it to be as light weight as possible. 最重要的是速度,我希望它尽可能轻便。

User is going to search for articles with already defined tags. 用户将搜索带有已定义标签的文章。 There's going to be between 1k - 5k articles and just a few keywords. 文章将在1k-5k之间,只有几个关键字。 The user frequency of the search function will be quite high. 搜索功能的用户频率将很高。

For example if user selects "color:blue" and "size:large" only articles with these tags should return. 例如,如果用户选择“颜色:蓝色”和“大小:大”,则仅应返回带有这些标签的文章。

Since there is just a few tags and many more articles i guess it's faster to have a table with the tags and article-id´s. 由于只有几个标签和更多文章,我想拥有一个带有标签和article-id的表格会更快。

So my first thought was to store a data string of article-id´s with every term. 所以我的第一个想法是在每个术语中存储一个article-id的数据字符串。 But what I know and heard this is not good practice. 但是我所知道和听到的这不是一个好习惯。 Still it feels like it should be lighter and faster? 还是感觉它应该更轻更快?

Article -id -name -etc 文章 -id -name -etc

Tags -id -name -all related art-ids 标签 -id-名称-所有相关art-id

I also seen examples of using an third table like this: 我还看到了使用第三个表的示例,如下所示:

Article -id -name -etc 文章 -id -name -etc

ArticleTagRelation -FKey art-id -FKey tag-id ArticleTagRelation -FKey art-id -FKey标签ID

Tags -id -name -etc 标签 -id -name -etc

The third alternative I can think of is storing a data string of terms with every article (just one table). 我能想到的第三个选择是在每篇文章(仅一张表)中存储一个术语数据字符串。

Article -id -name -tags -etc 文章 -id -name -tags -etc

In my case, what would be the fastest in performance way to go? 就我而言,最快的性能方法是什么?

You should use the three-table-solution in every circumstance! 您应在每种情况下都使用三表解决方案! Storing multiple Tags in one column will make it impossible to ever get a fast lookup. 在一列中存储多个标签将使无法快速查找。

To understand this behaviour, it's important to know Databases internally organize data. 要了解这种行为,了解数据库在内部组织数据非常重要。 They create sorted indizes, which make it possible to do lookups in very short time. 它们创建分类的索引,这使得在非常短的时间内进行查找成为可能。 However, this is only possible if the data is split into the correct columns. 但是,只有将数据分成正确的列,才有可能。

If you use either of the other solutions, the database system has to inspect one of the columns, split the value up, and do a search for every substring. 如果使用其他解决方案中的任何一种,则数据库系统必须检查其中一列,将值分割开,然后搜索每个子字符串。 As this can't be done with sql (at least without dirty hacks), you'll have to programm these in PHP, which will lead to incredible bad performance, as your code will query the RDBMS for every row. 由于这是无法用sql完成的(至少没有肮脏的技巧),因此您必须在PHP中进行编程,这将导致难以置信的糟糕性能,因为您的代码将查询RDBMS的每一行。

To give a few measurements: I just setup the following three tables: 进行一些测量:我只设置了以下三个表:

CREATE TABLE tags (
    `id` int(11) NOT NULL AUTO_INCREMENT,
    `name` varchar(20) NOT NULL,
    PRIMARY KEY (`id`)
) ENGINE=`InnoDB`;

CREATE TABLE articles_tags (
    `article` int(11) NOT NULL,
    `tag` int(11) NOT NULL,
    PRIMARY KEY (`tag`,`article`)
) ENGINE=`InnoDB`;

CREATE TABLE articles (
    `id` int(11) NOT NULL AUTO_INCREMENT,
    `name` varchar(32) NOT NULL,
    `content` text NOT NULL,
    PRIMARY KEY (`id`)
) ENGINE=`InnoDB`;
  • tags: I inserted 15 Tags consisting of 1-2 words 标签:我插入了15个由1-2个字组成的标签
  • articles: I inserted 2000 Articles with large random texts 文章:我插入了2000条带有大量随机文本的文章
  • articles_tags: I inserted ~19k entries merging random articles with random tags article_tags:我插入了约19k个条目,将随机文章与随机标签合并

Then I executed the following queries, in a standard MySQL-Configuration (128MB innodb_buffer_pool_size) on a Desktop-Class 2.2GHz CPU 然后,我在台式机级2.2GHz CPU上的标准MySQL配置(128MB innodb_buffer_pool_size)中执行了以下查询

SELECT articles.name FROM articles 
INNER JOIN articles_tags ON articles.ID=article
INNER JOIN tags ON tags.id=tag
WHERE tags.name = '/*one of the tags, random*/'

This took 19.308 seconds for 10000 queries, which menas an average of 0.0019s/query, so i think a it should be no problem to have higher loads with a properly tuned server Also note that these measurements are without any query caching. 10000次查询花费了19.308秒,平均每条查询0.0019s,因此我认为使用适当调整的服务器来提高负载应该没问题。另外请注意,这些度量没有任何查询缓存。 They are completely random, it's not the same query over and over. 它们是完全随机的,不是一遍又一遍的查询。 In a good Server environment, you should have no problems serving a few 100k searches per seconds. 在良好的服务器环境中,每秒处理十万次搜索应该没有问题。

An important note to the primary key: The order of the columns in a primary key / additional index matters! 关于主键的重要说明:主键/附加索引中列的顺序很重要! If you often need to look up the reverse order (find tags belonging to a given article name), you should add an additional index for the other direction! 如果您经常需要查找相反的顺序(查找属于给定商品名称的标签),则应为另一个方向添加一个附加索引!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM