Select values between multiple parentheses in a string via SQL query

Question

I am using an Oracle database and I'm trying to select values between parentheses. Here is my table, which has ID and Roads as columns. I have read-only access to this database so I can only use SELECT :

ID   Roads
--   -----
1    #Chaussée de Waterloo (Ixelles)#
2    #Rue Reper-Vreven (Bruxelles)#
3    #Rue des Fraises (Anderlecht)#
4    #Chaussée de Roodebeek (Woluwe-Saint-Lambert)#
5    #Square Jean Absil (Etterbeek)#Avenue Hansen-Soulie (Etterbeek)#Avenue Le Marinel (Etterbeek)#

Basically, from the Roads column, I only want to keep the values between parentheses. As the final query has other tables in it, I want a select distinct. The desired output is:

 ID    Roads
------------------
  1      Ixelles  
  2      Bruxelles 
  3      Anderlecht
  4      Woluwe-Saint-Lambert
  5      Etterbeek, Etterbeek, Etterbeek

I tried the following query, which works fine when there is only one set of parentheses, but this doesn't work when there are several (like for ID 5), as it only gives back the values in the first set of parentheses:

select distinct substr(roads, instr(roads,'(') + 1, instr(roads,')') - instr(roads,'(') - 1) as roads 
from table

Does anyone know where I'm going wrong?

Answer 1

One solution - based in part on the one referenced by przemo_pl:

SELECT SUBSTR( with_parentheses,2,length(with_parentheses)-2) between_parenthesis
FROM 
  (select REGEXP_SUBSTR(dat, '\([^()]*\)+',1,level) AS with_parentheses 
     from (select '#Square Jean Absil (Etterbeek)#Avenue Hansen-Soulie (Anderlecht)#Avenue Le Marinel (Ixelles)#' as dat from dual )
   connect by LEVEL <= ( LENGTH(dat) - LENGTH(REPLACE(dat, '(', '')))    
  )

returns:

between_parenthesis
---------------------
"Etterbeek"
"Anderlecht"
"Ixelles"

If you want this re-assembled into a single row then that adds another wrinkle.

Answer 2

Please see my other post for a much simpler answer. I am leaving this here as it is still an interesting approach and a lesson on how over-thinking a solution can get way too complicated and sometimes one must just step back and approach a problem a different way. :-)

Ok, you need to loop through the rows and through the parentheses within the rows:

with tbl(ID, Roads) as (
  select 1, '#Chaussée de Waterloo (Ixelles)#' from dual
  union
  select 2, '#Rue Reper-Vreven (Bruxelles)#' from dual
  union
  select 3, '#Rue des Fraises (Anderlecht)#' from dual
  union
  select 4, '#Chaussée de Roodebeek (Woluwe-Saint-Lambert)#' from dual
  union
  select 5, '#Square Jean Absil (Etterbeek)#Avenue Hansen-Soulie (Etterbeek)#Avenue Le Marinel (Etterbeek)#' from dual
    )
    SELECT ID, Roads,
           COLUMN_VALUE AS match_nbr,
          REGEXP_SUBSTR( Roads ,'\(([^\)]*)\)', 1, COLUMN_VALUE, NULL, 1 ) AS match_value
   FROM   tbl,
          TABLE(
            CAST(
              MULTISET(
                SELECT LEVEL
                FROM   DUAL
                CONNECT BY LEVEL <= REGEXP_COUNT( Roads ,'\(' )
              ) AS SYS.ODCINUMBERLIST
            )
          );

Result:

See here for a similar post, which links to another post which provides more info. I don't claim to understand it fully. :-)

EDIT: Updated to get list of roads on one line using listagg( ):

SQL> with tbl(ID, Roads) as (
     select 1, '#Chaussée de Waterloo (Ixelles)#' from dual
     union
     select 2, '#Rue Reper-Vreven (Bruxelles)#' from dual
     union
     select 3, '#Rue des Fraises (Anderlecht)#' from dual
     union
     select 4, '#Chaussée de Roodebeek (Woluwe-Saint-Lambert)#' from dual
     union
     select 5, '#Square Jean Absil (Etterbeek)#Avenue Hansen-Soulie (Etterbeek)#Avenue Le Marinel (Etterbeek)#' from dual
       )
       select id,
       listagg(match_value, ', ') within group (order by id) road_list
       from (
       SELECT ID, Roads, COLUMN_VALUE AS match_nbr,
             REGEXP_SUBSTR( Roads ,'\(([^\)]*)\)', 1, COLUMN_VALUE, NULL, 1 ) AS match_value
      FROM   tbl,
             TABLE(
               CAST(
                 MULTISET(
                   SELECT LEVEL
                   FROM   DUAL
                   CONNECT BY LEVEL <= REGEXP_COUNT( Roads ,'\(' )
                 ) AS SYS.ODCINUMBERLIST
               )
             )
       )
       group by id
       order by id
       ;

        ID ROAD_LIST
---------- --------------------------------------------------
         1 Ixelles
         2 Bruxelles
         3 Anderlecht
         4 Woluwe-Saint-Lambert
         5 Etterbeek, Etterbeek, Etterbeek

SQL>

Answer 3

I am adding this as a new answer as it is so different from my previous one, which was a classic example of over-complicated thinking getting worse with each iteration and thus is still a good example of that! lol Sometimes you just need to feel when you are getting too complicated and don't be afraid to start over on a different tack!

Ok, check this out. I went back to square one and studied the string for a pattern. Whether one road or more (actually this design of multiple values in one column violates basic data modeling tenets and should be reworked but who hasn't had to deal with a crappy design that we have no control over?), each road is surrounded by pound signs. My thought was to loop through the string using a regular expression, replacing the pattern of a pound sign through the closing parentheses with what is inside the parentheses. Granted this leaves a pound sign at the end but we'll clean that up later. Note that REGEXP_REPLACE will replace all occurrences of the pattern if found so by default loops through all roads and is WAY easier to maintain then a fugly mass of nested INSTR(), SUBSTR() :

SQL> with tbl(ID, Roads) as (
     select 1, '#Chaussée de Waterloo (Ixelles)#' from dual
     union
     select 2, '#Rue Reper-Vreven (Bruxelles)#' from dual
     union
     select 3, '#Rue des Fraises (Anderlecht)#' from dual
     union
     select 4, '#Chaussée de Roodebeek (Woluwe-Saint-Lambert)#' from dual
     union
     select 5, '#Square Jean Absil (Etterbeek)#Avenue Hansen-Soulie (Etterbeek)#Avenue Le Marinel (Etterbeek)#' from dual
   )
   select ID, rtrim(regexp_replace(Roads, '#.+?\((.+)\)', '\1, '), ', #') Roads
   from tbl;

        ID ROADS
---------- ----------------------------------------
         1 Ixelles
         2 Bruxelles
         3 Anderlecht
         4 Woluwe-Saint-Lambert
         5 Etterbeek, Etterbeek, Etterbeek

SQL>

The regular expression pattern explained:

#    Look for a literal pound sign
.    followed by any character
+    followed by one or more of the previous character (any character)
?    make the previous character optional (one or more any characters)
\(   a literal left paren
(    start remembered group 1
.    any character
+    one or more "any" characters
)    end remembered group 1
\)   followed by a literal closing right paren

If the above string is found, replace with the "replace-with" string:

\1       The first remembered group which is what is inside the parentheses.
,<space> followed by a comma and a space

Then its a tad quick and dirty, but just use RTRIM to remove the trailing comma-space-pound sign. Wha-la! Whew.

Select values between multiple parentheses in a string via SQL query

Question

3 answers

solution1
0 2015-10-27 15:31:10

solution2
0 2015-10-27 15:31:59

solution3
0 2015-10-28 15:01:53

Select values between multiple parentheses in a string via SQL query

Question

3 answers

solution1 0 2015-10-27 15:31:10

solution2 0 2015-10-27 15:31:59

solution3 0 2015-10-28 15:01:53

solution1
0 2015-10-27 15:31:10

solution2
0 2015-10-27 15:31:59

solution3
0 2015-10-28 15:01:53