简体   繁体   中英

Oracle SQL REGEX extracting text from the middle of a varchar2

I'm looking to extract a bit of text from the middle of a varchar2 column.

Here are a few examples:

 TEST DATA - SCOTLAND 1A
 TEST DATA - ENGLAND 6A
 TEST DATA - WALES 3A
 TEST DATA - IRELAND 2A

The data I'm looking to return would be:-

 SCOTLAND
 ENGLAND
 WALES
 IRELAND

Many thanks

Lee

This query seems to be working:

SELECT
    input,
    REGEXP_REPLACE(input, '.*- (\D+).*', '\1') AS country
FROM yourTable;

在此输入图像描述

Demo

Here we use REGEXP_REPLACE with the pattern:

.*- (\D+).*

This captures any non digit characters occurring between the em dash and the space that follows the country name. Then, we just replace with that captured country name.

This could be a way, assuming that you need to get the part of the string between the (unique) '-' and the first digit.

with testTable(string) as (
    select 'TEST DATA - SCOTLAND 1A' from dual union all
    select 'TEST DATA - ENGLAND 6A' from dual union all
    select 'TEST DATA - WALES 3A' from dual union all
    select 'TEST DATA - IRELAND 2A' from dual union all
    select 'TEST DATA - NORTHERN IRELAND 2A' from dual
)
select string,
       regexp_substr(string, '\- ([^0-9]*)[0-9]', 1, 1, 'i', 1) result
from testTable



STRING                          RESULT                         
------------------------------- -------------------------------
TEST DATA - SCOTLAND 1A         SCOTLAND                       
TEST DATA - ENGLAND 6A          ENGLAND                        
TEST DATA - WALES 3A            WALES                          
TEST DATA - IRELAND 2A          IRELAND                        
TEST DATA - NORTHERN IRELAND 2A NORTHERN IRELAND               

5 rows selected.

Thanks to Aleksej for CTE.

How about good, old SUBSTR + INSTR ?

SQL> with testTable(string) as (
  2      select 'TEST DATA - SCOTLAND 1A' from dual union all
  3      select 'TEST DATA - ENGLAND 6A' from dual union all
  4      select 'TEST DATA - WALES 3A' from dual union all
  5      select 'TEST DATA - IRELAND 2A' from dual union all
  6      select 'TEST DATA - NORTHERN IRELAND 2A' from dual
  7  )
  8  select
  9    string,
 10    trim(substr(string,
 11                instr(string, '-') + 1,
 12                instr(string, ' ', -1) - instr(string, '-')
 13         )) result
 14  from testtable;

STRING                          RESULT
------------------------------- -------------------------------
TEST DATA - SCOTLAND 1A         SCOTLAND
TEST DATA - ENGLAND 6A          ENGLAND
TEST DATA - WALES 3A            WALES
TEST DATA - IRELAND 2A          IRELAND
TEST DATA - NORTHERN IRELAND 2A NORTHERN IRELAND

SQL>

One option would be applying firstly regexp_replace() as pattern, which looks for digits, to see the country names as the third words and the regexp_substr() to extract those ones only.

with cte as
(
 select 'TEST DATA - SCOTLAND 1A' as str from dual union all 
 select 'TEST DATA - ENGLAND 6A'         from dual union all 
 select 'TEST DATA - WALES 3A'           from dual union all 
 select 'TEST DATA - IRELAND 2A'         from dual
)    
select regexp_substr(  
                     regexp_replace(str, '-(.*)\d.*', '\1' ) 
                     , '[^ ]+', 1 , 3 )
       as "Country",
       trim(regexp_substr( str, '[^-]+(\s)', 1 , 2 ))
       as "Country2"
  from cte;

Country    Country2
--------   --------
SCOTLAND   SCOTLAND
ENGLAND    ENGLAND
WALES      WALES
IRELAND    IRELAND

Demo

PS the "column2" may be a smarter alternative without need of regexp_replace() function.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM