For example i have the table called 'Table1'. and column called 'country'. I want to count the value of word in string.below is my data for column 'country':
country:
"japan singapore japan chinese chinese chinese"
expected output: in above data we can see the japan appear two time, singapore once and chinese 3 times.i want to count value of word where japan is count as one, singapore as one and chinese as one. hence the ouput will be 3. please help me
ValueOfWord: 3
Firstly, it is a bad design to store multiple values in a single column as delimited string. You should consider normalizing the data as a permanent solution.
With the denormalized data, you could do it in a single SQL using REGEXP_SUBSTR :
SELECT COUNT(DISTINCT(regexp_substr(country, '[^ ]+', 1, LEVEL))) as "COUNT"
FROM table_name
CONNECT BY LEVEL <= regexp_count(country, ' ')+1
/
Demo:
SQL> WITH sample_data AS
2 ( SELECT 'japan singapore japan chinese chinese chinese' str FROM dual
3 )
4 -- end of sample_data mocking real table
5 SELECT COUNT(DISTINCT(regexp_substr(str, '[^ ]+', 1, LEVEL))) as "COUNT"
6 FROM sample_data
7 CONNECT BY LEVEL <= regexp_count(str, ' ')+1
8 /
COUNT
----------
3
See Split single comma delimited string into rows in Oracle to understand how the query works.
UPDATE
For multiple delimited string rows you need to take care of the number of rows formed by the CONNECT BY clause.
See Split comma delimited strings in a table in Oracle for more ways of doing the same task.
Setup
Let's say you have a table with 3 rows like this:
SQL> CREATE TABLE t(country VARCHAR2(200));
Table created.
SQL> INSERT INTO t VALUES('japan singapore japan chinese chinese chinese');
1 row created.
SQL> INSERT INTO t VALUES('singapore indian malaysia');
1 row created.
SQL> INSERT INTO t VALUES('french french french');
1 row created.
SQL> COMMIT;
Commit complete.
SQL> SELECT * FROM t;
COUNTRY
---------------------------------------------------------------------------
japan singapore japan chinese chinese chinese
singapore indian malaysia
french french french
We expect the output as 6
since there are 6 unique strings.
SQL> SELECT COUNT(DISTINCT(regexp_substr(t.country, '[^ ]+', 1, lines.column_value))) count
2 FROM t,
3 TABLE (CAST (MULTISET
4 (SELECT LEVEL FROM dual
5 CONNECT BY LEVEL <= regexp_count(t.country, ' ')+1
6 ) AS sys.odciNumberList ) ) lines
7 ORDER BY lines.column_value
8 /
COUNT
----------
6
There are many other methods to achieve the desired output. Let's see how:
SQL> SELECT COUNT(DISTINCT(country)) COUNT 2 FROM 3 (SELECT trim(COLUMN_VALUE) country 4 FROM t, 5 xmltable(('"' 6 || REPLACE(country, ' ', '","') 7 || '"')) 8 ) 9 / COUNT ---------- 6
SQL> WITH 2 model_param AS 3 ( 4 SELECT country AS orig_str , 5 ' ' 6 || country 7 || ' ' AS mod_str , 8 1 AS start_pos , 9 Length(country) AS end_pos , 10 (LENGTH(country) - 11 LENGTH(REPLACE(country, ' '))) + 1 AS element_count , 12 0 AS element_no , 13 ROWNUM AS rn 14 FROM t ) 15 SELECT COUNT(DISTINCT(Substr(mod_str, start_pos, end_pos-start_pos))) count 16 FROM ( 17 SELECT * 18 FROM model_param 19 MODEL PARTITION BY (rn, orig_str, mod_str) 20 DIMENSION BY (element_no) 21 MEASURES (start_pos, end_pos, element_count) 22 RULES ITERATE (2000) 23 UNTIL (ITERATION_NUMBER+1 = element_count[0]) 24 ( start_pos[ITERATION_NUMBER+1] = 25 instr(cv(mod_str), ' ', 1, cv(element_no)) + 1, 26 end_pos[ITERATION_NUMBER+1] = 27 instr(cv(mod_str), ' ', 1, cv(element_no) + 1) ) 28 ) 29 WHERE element_no != 0 30 ORDER BY mod_str , element_no 31 / COUNT ---------- 6
Did you store that kind of string in a single entry?
If not, try
SELECT COUNT(*)
FROM (SELECT DISTINCT T.country FROM Table1 T)
If yes, I would write an external program to parse the string and return the result you want.
Like using java.
Create a String set.
I would use JDBC to retrieve the record, and use split to split strings in tokens using ' 'delimiter. For every token, if it is not in the set, add it to the set.
When parse finishes, get the length of the set, which is the value you want.
Break the string based on the space delimiter
SELECT COUNT(DISTINCT regexp_substr(col, '[^ ]+', 1, LEVEL))
FROM T
CONNECT BY LEVEL <= regexp_count(col, ' ')+1
For counting DISTINCT words
SELECT col,
COUNT(DISTINCT regexp_substr(col, '[^ ]+', 1, LEVEL))
FROM T
CONNECT BY LEVEL <= regexp_count(col, ' ')+1
GROUP BY col
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.