I'm trying to extract hashtags from a field using REGEXP in sql. Right now I'm doing this query
SELECT caption FROM posts WHERE caption REGEXP "#[a-zA-Z0-9_]+"
But i want to extract that specific word which was matched by this pattern.
Like if i have following entry in my database
id caption user
1 #hi i'm here 2
2 hello #hi there 3
3 i'm x #hi 4
4 I'm #Driving 2
5 I #love #food 6
Right now my query is returning
caption
#hi i'm here
hello #hi there
i'm x #hi
I'm #Driving
I #love #food
But I want
tag
#hi
#Driving
#love
#food
How can i achieve this.
Thanks for your help.
Create table/insert data
CREATE TABLE Table1
(`id` INT, `caption` VARCHAR(255), `user` INT)
;
INSERT INTO Table1
(`id`, `caption`, `user`)
VALUES
(1, '#hi i''m here', 2),
(2, 'hello #hi there', 3),
(3, 'i''m x #hi', 4),
(4, 'I''m #Driving', 2),
(5, 'I #love #food', 6)
;
You can split the words in caption with SUBSTRING_INDEX(SUBSTRING_INDEX(caption, ' ', 1), ' ', -1)
to get the first word SUBSTRING_INDEX(SUBSTRING_INDEX(caption, ' ', 2), ' ', -1)
to get the second word.
But how to make it dynamic so you larger number of words can splitted.
First you make a number generator with SQL. This query will generate a list of number from 1 to 100
Query
SELECT
@number := @number + 1 AS number
FROM (
(SELECT 0 UNION ALL SELECT 1 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) row1
CROSS JOIN
(SELECT 0 UNION ALL SELECT 1 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) row2
CROSS JOIN
(SELECT @number:=0) AS init_user_params
)
Result
number
--------
1
2
3
4
5
6
7
8
9
10
...
...
90
91
92
93
94
95
96
97
98
99
100
Now we can CROSS JOIN our number generated list with our Table1 (in our example) This will generate (table count) * 100 records with duplicated records. And use the generated numbers list with in SUBSTRING_INDEX(SUBSTRING_INDEX(caption, ' ', [word offset]), ' ', -1)
like so
Query
SELECT
DISTINCT #remove duplicates
SUBSTRING_INDEX(SUBSTRING_INDEX(caption, ' ', numbers.number), ' ', -1) AS tag
FROM (
SELECT
@number := @number + 1 AS number
FROM (
(SELECT 0 UNION ALL SELECT 1 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) row1
CROSS JOIN
(SELECT 0 UNION ALL SELECT 1 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) row2
CROSS JOIN
(SELECT @number:=0) AS init_user_params
)
)
AS numbers
CROSS JOIN Table1
WHERE
SUBSTRING_INDEX(SUBSTRING_INDEX(caption, ' ', numbers.number), ' ', -1) LIKE '#%' #we only want words starting with #
ORDER BY
Table1.id ASC
Result
tag
----------
#hi
#Driving
#love
#food
Notes
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.