How can you count how many times all the distinct words in a column appear
Below is an example and expected output
+--------+------------------------------+
| PERIOD | STRING |
+--------+------------------------------+
| | |
| 1 | this is some text |
| | |
| 2 | more text |
| | |
| 3 | this could be some more text |
+--------+------------------------------+
+-------+-------+
| WORD | COUNT |
+-------+-------+
| | |
| this | 2 |
| | |
| is | 1 |
| | |
| some | 2 |
| | |
| text | 3 |
| | |
| more | 2 |
| | |
| could | 1 |
| | |
| be | 1 |
+-------+-------+
Thanks,
You can use Hierarchical query such as
WITH t2 AS
(
SELECT REGEXP_SUBSTR(LOWER(string),'[^[:space:]]+',1,level) AS word
FROM t
CONNECT BY level <= REGEXP_COUNT(LOWER(string),'[:space:]') + 1
AND PRIOR SYS_GUID() IS NOT NULL
AND PRIOR period = period
)
SELECT word, COUNT(*) AS count
FROM t2
WHERE word IS NOT NULL
GROUP BY word
PS LOWER()
function is applied in order to get rid of problem related to case-sensitivity.
The trick is splitting the string into words. One method uses a recursive CTE:
with words(word, string, n) as (
select regexp_substr(string, '[^ ]+', 1, 1) as word, string, 1 as n
from t
union all
select regexp_substr(string, '[^ ]+', 1, n + 1), string, n + 1
from words
where regexp_substr(string, '[^ ]+', 1, n + 1) is not null
)
select word, count(*)
from words
group by word;
Here is a db<>fiddle.
You can do it without (slow) regular expressions using simple string functions:
WITH word_bounds ( string, start_pos, end_pos ) AS (
SELECT string,
1,
INSTR( string, ' ', 1 )
FROM table_name
UNION ALL
SELECT string,
end_pos + 1,
INSTR( string, ' ', end_pos + 1 )
FROM word_bounds
WHERE end_pos > 0
),
words ( word ) AS (
SELECT CASE end_pos
WHEN 0
THEN SUBSTR( string, start_pos )
ELSE SUBSTR( string, start_pos, end_pos - start_pos )
END
FROM word_bounds
)
SELECT word,
COUNT(*) AS frequency
FROM words
GROUP BY
word
ORDER BY
frequency desc, word;
Which, for the sample data:
CREATE TABLE table_name ( PERIOD, STRING ) AS
SELECT 1, 'this is some text' FROM DUAL UNION ALL
SELECT 2, 'more text' FROM DUAL UNION ALL
SELECT 3, 'this could be some more text' FROM DUAL
Outputs:
\nWORD | FREQUENCY\n:---- | --------: \ntext | 3 \nmore | 2 \nsome | 2 \nthis | 2 \nbe | 1 \ncould | 1 \nis | 1 \n
There is a discussion on the performance of different ways of splitting delimited strings here .
db<>fiddle here
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.