简体   繁体   English

mySQL:在多行中搜索字符串,并根据字符串的频率对结果进行排序

[英]mySQL: Search multiple rows for a string, and order the results based on how often the string

EDIT: As can be seen, I decided to go with mySQL's "Match". 编辑:可以看出,我决定使用mySQL的“匹配”。 That said, if someone knows of a clean method to do what I wanted within a SELECT statement, I would appreciate the information (knowledge for knowledge sake and all that) 就是说,如果有人知道在SELECT语句中可以执行我想要的操作的干净方法,我将不胜感激(出于知识和所有方面的知识)

I'm currently working on developing a local search engine for a website I'm designing, and as such one way in which I am using to determine the relevance of articles is the number of times the search terms appear in the article itself. 我目前正在为我正在设计的网站开发本地搜索引擎,因此,用来确定文章相关性的一种方式是,搜索字词出现在文章本身中的次数。 As such, I'm looking for an SQL query that will allow me to pull rows (articles) containing the search term, and than order them based on how many times the search terms appear in each row (articles). 因此,我正在寻找一个SQL查询,该查询将允许我提取包含搜索词的行(文章),然后根据搜索词在每行(文章)中出现的次数进行排序。

In other words, I need something like this... 换句话说,我需要这样的东西...

SELECT article_id FROM articles_table WHERE article_content LIKE '%Search Terms%' ORDER BY COUNT(number of times string appears in article_content);

So if a user were to search for "The Empire" and pulled up the following three articles... 因此,如果用户要搜索“帝国”并阅读以下三篇文章...

  1. The Empire is The Empire. 帝国就是帝国。
  2. The Empire is the name of a position in baseball. 帝国是棒球中一个职位的名称。
  3. The Empire The Empire The Empire. 帝国帝国帝国。

It would sort them as so.. 它将对它们进行排序。

  1. The Empire The Empire The Empire 帝国帝国帝国
  2. The Empire is The Empire 帝国就是帝国
  3. The Empire is the name of a position in baseball. 帝国是棒球中一个职位的名称。

I am working in PHP, and although ideally I would like to perform this operation with nothing more then one SQL query, I'm open to PHP solutions if this is not possible. 我正在PHP中工作,尽管理想情况下,我只想执行一个SQL查询就可以执行此操作,但是如果不可能的话,我愿意接受PHP解决方案。

Any and all help is greatly appreciated. 任何帮助都将不胜感激。

You should really consider a Full Text search solution. 您应该真正考虑全文搜索解决方案。 Either use MyISAM tables and MySQL native full text search , or you can go the external way and use something like Sphinx fulltext search or Lucene 您可以使用MyISAM表和MySQL本机全文搜索 ,也可以使用外部方式使用Sphinx全文搜索或Lucene之类的东西。

I totally agree with other answers. 我完全同意其他答案。 Theorically you could do something like this 理论上你可以做这样的事情

select (char_length('The Empire The Empire The Empire') - 
       char_length(replace(lower('The Empire The Empire The Empire'),lower('empire'),''))) / char_length('empire') as occurrences

to find how often a search term occurs in your string but this is a terrible method 查找字符串中搜索词出现的频率,但这是一个糟糕的方法

Not strictly an answer, but have you considered a full-text search engine such as Lucene? 严格来说,这不是一个答案,但是您是否考虑过像Lucene这样的全文搜索引擎?

Rather than build your own which will not be as good, I mean. 我的意思是,与其建立自己的表现不佳的工具,不如说是。

Here is a clever without using FULLTEXT searching 这是一个不使用FULLTEXT搜索的聪明方法

use test 使用测试
DROP TABLE IF EXISTS articles_table; 如果存在条款,则删除表;
CREATE TABLE articles_table 创建表article_table
(
article_id INT NOT NULL AUTO_INCREMENT PRIMARY KEY, article_id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
article_content TEXT article_content TEXT
) ENGINE=MyISAM; )ENGINE = MyISAM;
INSERT INTO articles_table (article_content) VALUES 插入到articles_table(article_content)值
('The Empire is The Empire'), (“帝国就是帝国”),
('The Empire is the name of a position in baseball.'), (“帝国是棒球中一个职位的名称。”),
('The Empire The Empire The Empire'); (“帝国帝国帝国”);
SELECT * FROM articles_table; SELECT * FROM article_table;

lwdba@localhost (DB test) :: SELECT * FROM articles_table; lwdba @ localhost(数据库测试):: SELECT * FROM article_table;
+------------+---------------------------------------------------+ + ------------ + ------------------------------------ --------------- +
| | article_id | article_id | article_content | article_content |
+------------+---------------------------------------------------+ + ------------ + ------------------------------------ --------------- +
| | 1 | 1 | The Empire is The Empire | 帝国就是帝国|
| | 2 | 2 | The Empire is the name of a position in baseball. 帝国是棒球中一个职位的名称。 | |
| | 3 | 3 | The Empire The Empire The Empire | 帝国帝国
+------------+---------------------------------------------------+ + ------------ + ------------------------------------ --------------- +
3 rows in set (0.00 sec) 设置3行(0.00秒)

SELECT article_content, SELECT article_content,
REPLACE(article_content,'The Empire','') newstring, 替换(article_content,'The Empire','')newstring,
LENGTH(article_content) origlen, LENGTH(article_content)个原件,
LENGTH(REPLACE(article_content,'The Empire','')) newlen, LENGTH(REPLACE(article_content,'The Empire',''))newlen,
FLOOR((LENGTH(article_content) - LENGTH(REPLACE(article_content,'The Empire','')))/(LENGTH('The Empire'))) score 得分((LENGTH(article_content)-LENGTH(REPLACE('帝国','')))/(LENGTH('The Empire')))得分
FROM articles_table; 来自article_table;

+---------------------------------------------------+-----------------------------------------+---------+--------+-------+ + ------------------------------------------------- -+ ----------------------------------------- + ----- ---- + -------- + ------- +
| | article_content | article_content | newstring | newstring | origlen | origlen | newlen | newlen | score | 得分| +---------------------------------------------------+-----------------------------------------+---------+--------+-------+ + ------------------------------------------------- -+ ----------------------------------------- + ----- ---- + -------- + ------- +
| | The Empire is The Empire | 帝国就是帝国| is | 是| 24 | 24 | 4 | 4 | 2 | 2 |
| | The Empire is the name of a position in baseball. 帝国是棒球中一个职位的名称。 | | is the name of a position in baseball. 是棒球位置的名称。 | | 49 | 49 | 39 | 39 | 1 | 1 |
| | The Empire The Empire The Empire | 帝国帝国 | | 32 | 32 | 2 | 2 | 3 | 3 |
+---------------------------------------------------+----------------------------------------+---------+--------+-------+ + ------------------------------------------------- -+ ---------------------------------------- + ------ --- + -------- + ------- +

The score is the number of deletions from the original string. 分数是从原始字符串中删除的数目。

Augment the query to show only the original text and the score: 扩大查询以仅显示原始文本和分数:

SELECT * FROM (SELECT article_content,FLOOR((LENGTH(article_content) - LENGTH(REPLACE(article_content,'The Empire','')))/(LENGTH('The Empire'))) score FROM articles_table) AA ORDER BY score DESC; SELECT * FROM(选择article_content,FLOOR((LENGTH(article_content)-LENGTH(REPLACE(article_content,'The Empire',''))))/(LENGTH('The Empire')))得分FROM Articles_table)AA ORDER BY score DESC;

Here is the final product 这是最终产品

lwdba@localhost (DB test) :: SELECT * FROM (SELECT article_content,FLOOR((LENGTH(article_content) - LENGTH(REPLACE(article_content,'The Empire','')))/(LENGTH('T he Empire'))) score FROM articles_table) AA ORDER BY score DESC; lwdba @ localhost(数据库测试):: SELECT * FROM(选择article_content,FLOOR((LENGTH(article_content)-LENGTH(REPLACE(article_content,'The Empire',''))))/(LENGTH('T he Empire') ))来自Articles_table的得分AA ORDER BY得分DESC;
+---------------------------------------------------+-------+ + ------------------------------------------------- -+ ------- +
| | article_content | article_content | score | 得分|
+---------------------------------------------------+-------+ + ------------------------------------------------- -+ ------- +
| | The Empire The Empire The Empire | 帝国帝国 3 | 3 |
| | The Empire is The Empire | 帝国就是帝国| 2 | 2 |
| | The Empire is the name of a position in baseball. 帝国是棒球中一个职位的名称。 | | 1 | 1 |
+---------------------------------------------------+-------+ + ------------------------------------------------- -+ ------- +
3 rows in set (0.06 sec) 设置3行(0.06秒)

Just insert any desired string into the two places in the query !!! 只需将任何所需的字符串插入查询的两个位置即可!

Give it a Try !!! 试试看 !!!

UPDATE: Oh well, I tried !!! 更新:哦,我试过了!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM