简体   繁体   中英

Oracle SQL regexp_replace stops at OR group

I try to filter with oracle SQLs regexp_replace a domain name from a list of URLs. The Problem seems to be that some of them do have Ports ans some do not have them.

From the following example the the-super.hosting.com should be replaced with HOSTNAME (but not hard coded in the regexpr as there could be anything)

WITH strings AS (   
  SELECT 'http://wwww11.the-super.hosting.com:9999/aPath/servlet?config=abcLoginNr=%1' s FROM dual union all   
  SELECT 'http://wwww22.the-super.hosting.com:6666/aPath/servlet?config=abcLoginNr=%2' s FROM dual union all   
  SELECT 'http://wwww22.the-super.hosting.com:6666/aPath/servlet?config=abcLoginNr=%2' s FROM dual union all   
  SELECT 'http://wwww04.the-super.hosting.com/aPath/servlet?config#here' s FROM dual   
)  
  SELECT regexp_replace(s,'([[:alpha:]]+://[[:alpha:]]{4}[[:digit:]]{2}\.)(.+)(:9999/|:6666/|/?)(.+)', '\1HOSTNAME\3\4') "MODIFIED_STRING", s "STRING"
  FROM strings;

It seems it cant handle the ports as optional with the normal path (as there the path starts directly).
Is it possible to match the domain part differently so that always the left over will be the path with the optional port?
is there a way to get it replaced with one statement?

I think you're making it more complicated that it needs to be. You only really need three parts; the initial protocol (anything followed by :// ) and the www??. prefix (assuming that is actually always present); the rest of the domain name to remove; and everything that's left, which may or may not include a port - but you don't really care; so:

([^.]+\.)([^/:]+)(.*)

where

  • ([^.]+\\.) is the protocol and everything up to and including the first dot in the domain name;
  • ([^/:]+) is anything up to either a slash or colon
  • (.*) is the rest

And for the replacement you want to keep the first and third parts as they are, and replace the second part with your fixed HOSTNAME .

So you get:

WITH strings AS (
  SELECT 'http://wwww11.the-super.hosting.com:9999/aPath/servlet?config=abcLoginNr=%1' s FROM dual union all
  SELECT 'http://wwww22.the-super.hosting.com:6666/aPath/servlet?config=abcLoginNr=%2' s FROM dual union all
  SELECT 'http://wwww22.the-super.hosting.com:6666/aPath/servlet?config=abcLoginNr=%2' s FROM dual union all
  SELECT 'http://wwww04.the-super.hosting.com/aPath/servlet?config#here' s FROM dual union all
  SELECT 'http://wwww04.the-super.hosting.com' s FROM dual union all
  SELECT 'http://wwww04.the-super.hosting.com/' s FROM dual union all
  SELECT 'http://wwww04.the-super.hosting.com/aPath' s FROM dual union all
  SELECT 'http://wwww04.the-super.hosting.com:1234' s FROM dual union all
  SELECT 'http://wwww04.the-super.hosting.com:1234/' s FROM dual union all
  SELECT 'http://wwww04.the-super.hosting.com:1234/aPath' s FROM dual
)  
SELECT regexp_replace(s, '([^.]+\.)([^/:]+)(.*)', '\1HOSTNAME\3') "MODIFIED_STRING", s "STRING"
FROM strings;

MODIFIED_STRING                                                STRING                                                                     
-------------------------------------------------------------- ---------------------------------------------------------------------------
http://wwww11.HOSTNAME:9999/aPath/servlet?config=abcLoginNr=%1 http://wwww11.the-super.hosting.com:9999/aPath/servlet?config=abcLoginNr=%1
http://wwww22.HOSTNAME:6666/aPath/servlet?config=abcLoginNr=%2 http://wwww22.the-super.hosting.com:6666/aPath/servlet?config=abcLoginNr=%2
http://wwww22.HOSTNAME:6666/aPath/servlet?config=abcLoginNr=%2 http://wwww22.the-super.hosting.com:6666/aPath/servlet?config=abcLoginNr=%2
http://wwww04.HOSTNAME/aPath/servlet?config#here               http://wwww04.the-super.hosting.com/aPath/servlet?config#here              
http://wwww04.HOSTNAME                                         http://wwww04.the-super.hosting.com                                        
http://wwww04.HOSTNAME/                                        http://wwww04.the-super.hosting.com/                                       
http://wwww04.HOSTNAME/aPath                                   http://wwww04.the-super.hosting.com/aPath                                  
http://wwww04.HOSTNAME:1234                                    http://wwww04.the-super.hosting.com:1234                                   
http://wwww04.HOSTNAME:1234/                                   http://wwww04.the-super.hosting.com:1234/                                  
http://wwww04.HOSTNAME:1234/aPath                              http://wwww04.the-super.hosting.com:1234/aPath                             

You can be more explicit about the protocol format etc. but I'm not sure there's much point.


The problem with your original pattern is a mix of greediness and the optional slash as the final 'or' component with the port numbers. You can tweak it to make it work, at least for your sample data, eg:

WITH strings AS (   
  SELECT 'http://wwww11.the-super.hosting.com:9999/aPath/servlet?config=abcLoginNr=%1' s FROM dual union all   
  SELECT 'http://wwww22.the-super.hosting.com:6666/aPath/servlet?config=abcLoginNr=%2' s FROM dual union all   
  SELECT 'http://wwww22.the-super.hosting.com:6666/aPath/servlet?config=abcLoginNr=%2' s FROM dual union all   
  SELECT 'http://wwww04.the-super.hosting.com/aPath/servlet?config#here' s FROM dual   
)  
SELECT regexp_replace(s,'([[:alpha:]]+://[[:alpha:]]{4}[[:digit:]]{2}\.)(.+?)(:9999/|:6666/|/)(.+)$', '\1HOSTNAME\3\4') "MODIFIED_STRING", s "STRING"
--                                                                         ^               ^^^    ^
FROM strings;

MODIFIED_STRING                                                STRING                                                                     
-------------------------------------------------------------- ---------------------------------------------------------------------------
http://wwww11.HOSTNAME:9999/aPath/servlet?config=abcLoginNr=%1 http://wwww11.the-super.hosting.com:9999/aPath/servlet?config=abcLoginNr=%1
http://wwww22.HOSTNAME:6666/aPath/servlet?config=abcLoginNr=%2 http://wwww22.the-super.hosting.com:6666/aPath/servlet?config=abcLoginNr=%2
http://wwww22.HOSTNAME:6666/aPath/servlet?config=abcLoginNr=%2 http://wwww22.the-super.hosting.com:6666/aPath/servlet?config=abcLoginNr=%2
http://wwww04.HOSTNAME/aPath/servlet?config#here               http://wwww04.the-super.hosting.com/aPath/servlet?config#here              

but it seems like overkill.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM