简体   繁体   English

PostgreSQL SQL 查询查找字符串中 substring 的出现次数

[英]PostgreSQL SQL query to find number of occurrences of substring in string

I'm trying to wrap my head around a problem but I'm hitting a blank.我正试图解决一个问题,但我一头雾水。 I know SQL quite well, but I'm not sure how to approach this.我非常了解 SQL,但我不确定如何处理。

My problem:我的问题:

Given a string and a table of possible substrings, I need to find the number of occurrences.给定一个字符串和一个可能的子字符串表,我需要找出出现的次数。

The search table consists of a single colum:搜索表由一个列组成:

searchtable搜索表

| pattern TEXT PRIMARY KEY|
|-------------------------|
| my                      |
| quick                   |
| Earth                   |

Given the string "Earth is my home pl.net and where my friends live", the expected outcome is 3 (2x "my" and 1x "Earth").给定字符串“Earth is my home pl.net and where my friends live”,预期结果为 3(2x“my”和 1x“Earth”)。

In my function, I have variable bodytext which is the string to examine.在我的 function 中,我有变量正文,它是要检查的字符串。

I know I can do IN (SELECT pattern FROM searchtable) to get the list of substrings, and I could possibly use a LIKE ANY clause to get matches, but how can I count occurrences of the substrings in the table within the search string?我知道我可以执行 IN (SELECT pattern FROM searchtable) 来获取子字符串列表,并且我可以使用 LIKE ANY 子句来获取匹配项,但是如何计算搜索字符串中表中子字符串的出现次数?

This is easily done without a custom function:无需自定义 function 即可轻松完成此操作:

select count(*)
from (values ('Earth is my home planet and where my friends live')) v(str) cross join lateral
     regexp_split_to_table(v.str, ' ') word join
     patterns p
     on word = p.pattern

Just break the original string into "words".只需将原始字符串分解为“单词”即可。 Then match on the words.然后匹配单词。

Another method uses regular expression matching:另一种方法使用正则表达式匹配:

select (select count(*) from regexp_matches(v.str, p.rpattern, 'g'))
from (values ('Earth is my home planet and where my friends live')) v(str) cross join
     (select string_agg(pattern, '|') as rpattern
      from patterns
     ) p;

This stuffs all the patterns into a regular expression.这会将所有模式填充到一个正则表达式中。 Not that this version does not take word breaks into account.并不是说这个版本不考虑分词。

Here is a db<>fiddle. 是一个 db<>fiddle。

I solved the problem with the following code:我用下面的代码解决了这个问题:

CREATE OR REPLACE FUNCTION count_matches(body TEXT, OUT matches INTEGER) AS $$
DECLARE
    results INTEGER := 0;
    matchlist RECORD;
BEGIN
FOR matchlist IN (SELECT pattern FROM searchtable)
LOOP
    results := results + (SELECT LENGTH(body) - 
        LENGTH(REPLACE(body, matchlist.pattern, ''))) / 
        LENGTH(matchlist.pattern);
END LOOP;
matches := results;
END;
$$ LANGUAGE plpgsql;

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 计算 PostgreSQL 中字符串中子字符串的出现次数 - Counting the number of occurrences of a substring within a string in PostgreSQL 从文本字段中查找 substring 的出现次数 - Find number of occurrences of substring from a text field SQL脚本以查找计算:username字符串的出现次数 - SQL Script to find Count the number of :username Occurrences of a String 如何查找 SQL 列中子字符串的出现次数(按另一列分组)? - How do I find the number of occurrences of a substring in a SQL column, as grouped by another column? 替换SQL列中字符串中子字符串的所有匹配项(首次匹配项除外) - Replacing all occurrences of a substring in a string in an SQL column except first occurrence 使用 ColdFusion 查询,查找 varchar 'headline' 的 SQL 出现次数,但仅显示该标题的一个实例 - Using ColdFusion query, find number of SQL occurrences of a varchar 'headline' but show only one instance of that headline Oracle查询以查找字符串中出现的所有charcter - Oracle query to find all occurrences of a charcter in a string 如何使用 sql 找到字符串中特定字符的出现次数? - How can you find the number of occurrences of a particular character in a string using sql? SQL 查询在重复字符之间查找 ZE83AED3DDF4667DEC0DAAAACB2BB3BE0BZ - SQL query to find substring between repeating characters sql查询以查找存在于其他表中的子字符串 - sql query to find the substring present in a different table
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM