简体   繁体   中英

How do I dynamically extract substring from string?

I'm trying to dynamically extract a substring from a very long URL. For example, I may have the following URLs:

  1. https://www.google.com/ABCDEF Version=“0.0.00.0” GHIJK
  2. https://www.google.com/ABCDEFGH Version=“0.0.0.0” IJKLM
  3. https://www.google.com/ABC Version=“0.0.0.00” 12345

I am trying to extract the version code only (0.0.0.0).

This is what I have so far:

SELECT SUBSTR(col, INSTR(col, ‘Version=“‘)+9) 
FROM table

This query returns the following result:

0.0.00.0” GHIJK … (url continues on)

So, I attempt to find “Version” in the link, so I can start from the same position in each row. This works fine, however I'm having a hard time dynamically locating the ending quote (“). I tried using INSTR in the third parameter of my SUBSTR function, like so:

SELECT SUBSTR(col, INSTR(col, ‘Version=“‘)+9, INSTR(col, ‘“‘)) 
FROM table

I figured that this would find the position of the ending quote, and then use that number for the length, but it returns a strange output. I've also used POSITION, CHARINDEX, LENGTH, and LOCATE. None of these functions work in Oracle.

I think maybe when I put +9 after the first INSTR function, it's setting the query to a fixed position instead of a dynamic one, but I'm not sure how else to remove 'Version=“'.

Here's one option (which, actually, selects what's between double quotes - that's version in your example; if there were some other similar substring , you'd get a wrong result).

with test (col) as
  (select 'https://www.google.com/ABCDEF Version="0.0.00.0" GHIJK'  from dual union all
   select 'https://www.google.com/ABCDEFGH Version="0.0.0.0" IJKLM' from dual union all
   select 'https://www.google.com/ABC Version="0.0.0.00" 12345'     from dual
  )
select col,
  replace(regexp_substr(col, '".+"'), '"') version
from test;

which results in

https://www.google.com/ABCDEF Version="0.0.00.0" GHIJK      0.0.00.0
https://www.google.com/ABCDEFGH Version="0.0.0.0" IJKLM     0.0.0.0
https://www.google.com/ABC Version="0.0.0.00" 12345         0.0.0.00

You can use REGEXP_SUBSTR() with Version=(\d.*\d?) pattern in order to extract the piece between Version=" and " (your quotes are presumed to be regular double quotes " " )

SELECT REGEXP_SUBSTR(url,'Version="(\d.*\d)"',1,1,null,1) AS version        
  FROM t

where

  • the third argument( 1 ) is position , the fourth argument( 1 ) is occurence , and especially important to use the last one as being capture group ( 1 )
  • indeed using '"(\d.*\d)"' pattern is enough for the current data set

or

REGEXP_REPLACE() with capture group \2 as

SELECT REGEXP_REPLACE(url,'^(.*Version=")([^"]*).*','\2') AS version        
  FROM t

Demo

You can still use use INSTR to locate the second " in the string, then subtract the location of the first " to get the length that you need to get. Below is an example query:

SELECT col,
       SUBSTR (col, INSTR (col, '"') + 1, INSTR (col, '"', 1, 2) - INSTR (col, '"') - 1)    version
  FROM test;

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM