简体   繁体   English

在MySQL中,如果movies表具有描述字段,如何在该描述上进行搜索?

[英]In MySQL, if the movies table has a description field, how to do search on this description?

Say, using MySQL, if the movies table has 20,000 records, and each record has a field that is the description of the movie, up to 2k byte long. 说,使用MySQL,如果movies表中有20,000条记录,每条记录都有一个字段是电影的描述,最多2K字节长。 How can we search for movies with the word "nature" in its description? 我们如何搜索描述中带有“自然”一词的电影? If possible, it is to be fast, instead of going through all the 20,000 records. 如果可能的话,它应该是快速的,而不是遍历所有20,000条记录。 (if in some other situations, like for books, where n can be 200,000 or more). (如果在某些其他情况下(例如书籍),则n可以为200,000或更大)。

I wouldn't process the description column directly - per-row functions on selects rarely scale well. 我不会直接处理description列-选择的每行函数很少能很好地扩展。 One of the guidelines I subscribe to is to never have to process things inside columns (like descriptions in your case, or parts of comma-separated-variable columns or even names (first/last) and address (street/town/state) components). 有一个问题我订阅的指导方针,是从来没有处理列里面的东西(比如你的情况描述或部分逗号分隔的变量列,甚至名字(姓/名)及地址(街道/镇/州)组件)。 If you're doing that, there's usually a more efficient way. 如果这样做,通常会有一种更有效的方法。

What I would do is to have insert, update and delete triggers on the table. 我要做的是在表上插入,更新和删除触发器。 For the insert/update triggers, I would populate another table along the lines of DescLookup below: 对于插入/更新触发器,我将沿着下面的DescLookup填充另一个表:

Movies:
    Id primary key
    Title
    Description
DescLookup:
    Word
    MovieId foreign key Movies(Id)
    Count
    primary key (Word,MovieId)
    index (MovieId)

Basically, for every non-noise word in the description (ie, discount things like and , or , by , punctuation, single-letter words and so on), you get an entry in this table (with the lower-cased word). 基本上,在描述(即折扣之类的东西每个非干扰词andorby ,标点符号,单字母词等),你在这个表中的条目(与小写的字)。

Make sure that the trigger deletes all current rows for that MovieId before re-populating lest you be left with incorrect information. 在重新MovieId之前,请确保触发器删除了该MovieId所有当前行,以免留下不正确的信息。

Then you use that table to run your queries. 然后,您可以使用该表运行查询。 This moves the "cost" of finding the words to the insert/update rather than every single select, amortising that cost considerably. 这将查找单词的“成本”移到了插入/更新而不是每个选择中,从而使该成本分摊。 This works well because the vast majority of databases are read far more often than written so moving the cost to the write part is a good idea. 这之所以行之有效,是因为绝大多数数据库的读取频率远高于写入数据,因此将成本转移到写入部分是个好主意。

Keep in mind there's extra storage required for this but, if you examine the large number of questions people ask about databases, "How can I do this fast?" 请记住,这需要额外的存储空间,但是,如果您检查了人们对数据库的大量疑问,“我如何快速做到这一点?” far outweighs "How can I use less disk space?". 远远超过了“如何使用更少的磁盘空间?”。

And the delete trigger will simply remove all entries in the DescLookup table with the relevant MovieId . 删除触发器将仅删除DescLookup表中具有相关MovieId所有条目。

Because the Word column is indexed (and also, as you requested, you will not be searching every single description field), searches on it will be blindingly fast. 由于Word列已建立索引(并且,根据您的要求,您将不会搜索每个描述字段),因此对其进行的搜索将非常快速。 That's because: 那是因为:

select MovieId from DescLookup where Word = 'nature';

will blow: 会吹:

select Id from Movies where lower(Description) like '%nature%';

out of the water. 从水里出来。

You want to use a full-text search index in this case. 在这种情况下,您想使用全文本搜索索引。 Be aware that there are some catches though, such as minimum word, length, stop-words etc. 请注意,尽管有一些问题,例如最小字,长度,停用字等。

The syntax for FTS looks like this: FTS的语法如下所示:

WHERE MATCH (field) AGAINST ('text');

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM