简体   繁体   English

计算MySQL中一行中单词的出现次数

[英]Count occurrences of a word in a row in MySQL

I'm making a search function for my website, which finds relevant results from a database. 我正在为我的网站创建一个搜索功能,它可以从数据库中找到相关结果。 I'm looking for a way to count occurrences of a word, but I need to ensure that there are word boundaries on both sides of the word ( so I don't end up with "triple" when I want "rip"). 我正在寻找一种计算单词出现次数的方法,但我需要确保单词的两边都有单词边界(所以当我想要“rip”时,我不会以“三重”结束)。

Does anyone have any ideas? 有没有人有任何想法?


People have misunderstood my question: 人们误解了我的问题:

How can I count the number of such occurences within a single row? 如何计算单行内出现的次数

This is not the sort of thing that relational databases are very good at, unless you can use fulltext indexing, and you have already stated that you cannot, since you're using InnoDB. 这不是关系数据库非常擅长的东西,除非你可以使用全文索引,并且你已经说过你不能,因为你正在使用InnoDB。 I'd suggest selecting your relevant rows and doing the word count in your application code. 我建议您选择相关的行并在应用程序代码中执行单词计数。

create a user defined function like this and use it in your query 创建这样的用户定义函数并在查询中使用它

DELIMITER $$

CREATE FUNCTION `getCount`(myStr VARCHAR(1000), myword VARCHAR(100))
    RETURNS INT
    BEGIN
    DECLARE cnt INT DEFAULT 0;
    DECLARE result INT DEFAULT 1;

    WHILE (result > 0) DO
    SET result = INSTR(myStr, myword);
    IF(result > 0) THEN 
        SET cnt = cnt + 1;
        SET myStr = SUBSTRING(myStr, result + LENGTH(myword));
    END IF;
    END WHILE;
    RETURN cnt;    

    END$$

DELIMITER ;

Hope it helps Refer This 希望有帮助推荐这个

You can try this perverted way: 你可以尝试这种变态的方式:

SELECT 
(LENGTH(field) - LENGTH(REPLACE(field, 'word', ''))) / LENGTH('word') AS `count`
ORDER BY `count` DESC
  • This query can be very slow 此查询可能非常慢
  • It looks pretty ugly 它看起来很丑陋
  • REPLACE() is case-sensitive REPLACE()区分大小写

You can overcome the issue of mysql's case-sensitive REPLACE() function by using LOWER() . 您可以通过使用LOWER()来克服mysql区分大小写的REPLACE()函数的问题。

Its sloppy, but on my end this query runs pretty fast. 它草率,但在我看来这个查询运行得非常快。

To speed things along I retrieve the resultset in a select which I have declared as a derived table in my 'outer' query. 为了加快速度,我在一个select中检索结果集,该select在我的'outer'查询中声明为派生表。 Since mysql already has the results at this point, the replace method works pretty quickly. 由于mysql此时已经有了结果,所以replace方法非常快。

I created a query similar to the one below to search for multiple terms in multiple tables and multiple columns. 我创建了一个类似于下面的查询,以在多个表和多列中搜索多个术语。 I obtain a 'relevance' number equivalent to the sum of the count of all occurrances of all found search terms in all columns searched 我获得了一个“相关性”数字,相当于搜索到的所有列中所有找到的搜索项的所有出现次数的总和

SELECT DISTINCT ( 
((length(x.ent_title) - length(replace(LOWER(x.ent_title),LOWER('there'),''))) / length('there')) 
+ ((length(x.ent_content) - length(replace(LOWER(x.ent_content),LOWER('there'),''))) / length('there'))
 + ((length(x.ent_title) - length(replace(LOWER(x.ent_title),LOWER('another'),''))) / length('another')) 
+ ((length(x.ent_content) - length(replace(LOWER(x.ent_content),LOWER('another'),''))) / length('another')) 
) as relevance, 
x.ent_type, 
x.ent_id, 
x.this_id as anchor,
page.page_name
FROM ( 
(SELECT 
'Foo' as ent_type, 
sp.sp_id as ent_id, 
sp.page_id as this_id, 
sp.title as ent_title, 
sp.content as ent_content,
sp.page_id as page_id
FROM sp
WHERE (sp.title LIKE '%there%' OR sp.content LIKE '%there%' OR sp.title LIKE '%another%' OR sp.content LIKE '%another%' ) AND (sp_content.title NOT LIKE '%goes%' AND sp_content.content NOT LIKE '%goes%')
) UNION (
  [search a different table here.....]
)
) as x
JOIN page ON page.page_id = x.page_id 
WHERE page.rstatus = 'ACTIVE'
ORDER BY relevance DESC, ent_title;

Hope this helps someone 希望这有助于某人

-- Seacrest out - Seacrest out

Something like this should work: 这样的事情应该有效:

select count(*) from table where fieldname REGEXP '[[:<:]]word[[:>:]]'; 从表中选择count(*),其中fieldname REGEXP'[[:<:]] word [[:>:]]';

The gory details are in the MySQL manual, section 11.4.2. 详细信息请参见MySQL手册的第11.4.2节。

Something like LIKE or REGEXP will not scale (unless it's a leftmost prefix match). LIKE或REGEXP之类的东西不会缩放(除非它是最左边的前缀匹配)。

Consider instead using a fulltext index for what you want to do. 请考虑使用全文索引来完成您想要做的事情。

select count(*) from yourtable where match(title, body) against ('some_word');

I have used the technique as described in the link below. 我使用了下面链接中描述的技术。 The method uses length and replace functions of MySQL. 该方法使用MySQL的lengthreplace功能。

Keyword Relevance 关键字相关性

If you want a search I would advise something like Sphinx or Lucene, I find Sphinx (as an independent full text indexer) to be a lot easier to set up and run. 如果你想要搜索我会建议像Sphinx或Lucene这样的东西,我发现Sphinx(作为一个独立的全文索引器)更容易设置和运行。 It runs fast, and generates the indexes very fast. 它运行速度很快,并且可以非常快速地生成索引。 Even if you were using MyISAM I would suggest using it, it has a lot more power than a full text index from MyISAM. 即使您使用MyISAM我建议使用它,它比MyISAM的全文索引功能强大得多。

It can also integrate (somewhat) with MySQL. 它也可以(有点)与MySQL集成。

这取决于您使用的DBMS,有些允许编写可以执行此操作的UDF。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM