简体   繁体   中英

How to select the word which is matched by regexp in sql?

I'm trying to extract hashtags from a field using REGEXP in sql. Right now I'm doing this query

SELECT caption FROM posts WHERE caption REGEXP "#[a-zA-Z0-9_]+"

But i want to extract that specific word which was matched by this pattern.

Like if i have following entry in my database

id caption         user
1  #hi i'm here    2
2  hello #hi there 3
3  i'm x #hi       4
4  I'm #Driving    2
5  I #love #food   6

Right now my query is returning

caption
#hi i'm here
hello #hi there
i'm x #hi
I'm #Driving
I #love #food

But I want

tag
#hi
#Driving
#love
#food

How can i achieve this.

Thanks for your help.

Create table/insert data

CREATE TABLE Table1
    (`id` INT, `caption` VARCHAR(255), `user` INT)
;

INSERT INTO Table1
    (`id`, `caption`, `user`)
VALUES
    (1, '#hi i''m here', 2),
    (2, 'hello #hi there', 3),
    (3, 'i''m x #hi', 4),
    (4, 'I''m #Driving', 2),
    (5, 'I #love #food', 6)
;

You can split the words in caption with SUBSTRING_INDEX(SUBSTRING_INDEX(caption, ' ', 1), ' ', -1) to get the first word SUBSTRING_INDEX(SUBSTRING_INDEX(caption, ' ', 2), ' ', -1) to get the second word.

But how to make it dynamic so you larger number of words can splitted.

First you make a number generator with SQL. This query will generate a list of number from 1 to 100

Query

SELECT
  @number := @number + 1 AS number
FROM (
  (SELECT 0 UNION ALL SELECT 1 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) row1
  CROSS JOIN
  (SELECT 0 UNION ALL SELECT 1 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) row2
  CROSS JOIN 
  (SELECT @number:=0) AS init_user_params
)  

Result

number  
--------
       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
      ...
      ...
      90
      91
      92
      93
      94
      95
      96
      97
      98
      99
     100

Now we can CROSS JOIN our number generated list with our Table1 (in our example) This will generate (table count) * 100 records with duplicated records. And use the generated numbers list with in SUBSTRING_INDEX(SUBSTRING_INDEX(caption, ' ', [word offset]), ' ', -1) like so

Query

SELECT  
  DISTINCT #remove duplicates
    SUBSTRING_INDEX(SUBSTRING_INDEX(caption, ' ', numbers.number), ' ', -1) AS tag
FROM (

  SELECT
    @number := @number + 1 AS number
  FROM (
    (SELECT 0 UNION ALL SELECT 1 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) row1
     CROSS JOIN
    (SELECT 0 UNION ALL SELECT 1 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) row2
     CROSS JOIN 
    (SELECT @number:=0) AS init_user_params
  )  
) 
 AS numbers
CROSS JOIN Table1
WHERE
 SUBSTRING_INDEX(SUBSTRING_INDEX(caption, ' ', numbers.number), ' ', -1) LIKE '#%' #we only want words starting with #
ORDER BY 
 Table1.id ASC 

Result

tag       
----------
#hi       
#Driving  
#love     
#food     

Notes

  1. This query only works when caption have equal to 100 words or less
  2. This query is pretty fast on smaller sized tables. On larger tables this won't scale well because off the CROSS JOIN
  3. You really should have a table where you store the hashtags.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2025 STACKOOM.COM