简体   繁体   English

Oracle SQL regexp_replace在OR组处停止

[英]Oracle SQL regexp_replace stops at OR group

I try to filter with oracle SQLs regexp_replace a domain name from a list of URLs. 我尝试使用oracle SQL regexp_replace从URL列表中过滤域名。 The Problem seems to be that some of them do have Ports ans some do not have them. 问题似乎是其中一些确实具有端口号,而有些则没有。

From the following example the the-super.hosting.com should be replaced with HOSTNAME (but not hard coded in the regexpr as there could be anything) 在下面的示例中, the-super.hosting.com应该替换为HOSTNAME (但不要在regexpr中进行硬编码,因为可能会有任何问题)

WITH strings AS (   
  SELECT 'http://wwww11.the-super.hosting.com:9999/aPath/servlet?config=abcLoginNr=%1' s FROM dual union all   
  SELECT 'http://wwww22.the-super.hosting.com:6666/aPath/servlet?config=abcLoginNr=%2' s FROM dual union all   
  SELECT 'http://wwww22.the-super.hosting.com:6666/aPath/servlet?config=abcLoginNr=%2' s FROM dual union all   
  SELECT 'http://wwww04.the-super.hosting.com/aPath/servlet?config#here' s FROM dual   
)  
  SELECT regexp_replace(s,'([[:alpha:]]+://[[:alpha:]]{4}[[:digit:]]{2}\.)(.+)(:9999/|:6666/|/?)(.+)', '\1HOSTNAME\3\4') "MODIFIED_STRING", s "STRING"
  FROM strings;

It seems it cant handle the ports as optional with the normal path (as there the path starts directly). 似乎无法使用普通路径将端口作为可选端口处理(因为该路径直接开始)。
Is it possible to match the domain part differently so that always the left over will be the path with the optional port? 是否可以以不同的方式匹配域部分,以便始终将剩下的部分作为带有可选端口的路径?
is there a way to get it replaced with one statement? 有没有一种方法可以将它替换为一个语句?

I think you're making it more complicated that it needs to be. 我认为您正在使它变得更加复杂。 You only really need three parts; 您真的只需要三个部分; the initial protocol (anything followed by :// ) and the www??. 初始协议(任何后跟://协议)和www??. prefix (assuming that is actually always present); 前缀(假设实际上一直存在); the rest of the domain name to remove; 其余域名要删除; and everything that's left, which may or may not include a port - but you don't really care; 剩下的一切,可能包括端口,也可能不包括端口,但是您并不在乎; so: 所以:

([^.]+\.)([^/:]+)(.*)

where 哪里

  • ([^.]+\\.) is the protocol and everything up to and including the first dot in the domain name; ([^.]+\\.)是协议,包括域名中第一个点在内的所有内容;
  • ([^/:]+) is anything up to either a slash or colon ([^/:]+)可以是斜线或冒号
  • (.*) is the rest 剩下的是(.*)

And for the replacement you want to keep the first and third parts as they are, and replace the second part with your fixed HOSTNAME . 对于替换,您希望第一部分和第三部分保持不变,并用固定的HOSTNAME替换第二部分。

So you get: 这样就得到:

WITH strings AS (
  SELECT 'http://wwww11.the-super.hosting.com:9999/aPath/servlet?config=abcLoginNr=%1' s FROM dual union all
  SELECT 'http://wwww22.the-super.hosting.com:6666/aPath/servlet?config=abcLoginNr=%2' s FROM dual union all
  SELECT 'http://wwww22.the-super.hosting.com:6666/aPath/servlet?config=abcLoginNr=%2' s FROM dual union all
  SELECT 'http://wwww04.the-super.hosting.com/aPath/servlet?config#here' s FROM dual union all
  SELECT 'http://wwww04.the-super.hosting.com' s FROM dual union all
  SELECT 'http://wwww04.the-super.hosting.com/' s FROM dual union all
  SELECT 'http://wwww04.the-super.hosting.com/aPath' s FROM dual union all
  SELECT 'http://wwww04.the-super.hosting.com:1234' s FROM dual union all
  SELECT 'http://wwww04.the-super.hosting.com:1234/' s FROM dual union all
  SELECT 'http://wwww04.the-super.hosting.com:1234/aPath' s FROM dual
)  
SELECT regexp_replace(s, '([^.]+\.)([^/:]+)(.*)', '\1HOSTNAME\3') "MODIFIED_STRING", s "STRING"
FROM strings;

MODIFIED_STRING                                                STRING                                                                     
-------------------------------------------------------------- ---------------------------------------------------------------------------
http://wwww11.HOSTNAME:9999/aPath/servlet?config=abcLoginNr=%1 http://wwww11.the-super.hosting.com:9999/aPath/servlet?config=abcLoginNr=%1
http://wwww22.HOSTNAME:6666/aPath/servlet?config=abcLoginNr=%2 http://wwww22.the-super.hosting.com:6666/aPath/servlet?config=abcLoginNr=%2
http://wwww22.HOSTNAME:6666/aPath/servlet?config=abcLoginNr=%2 http://wwww22.the-super.hosting.com:6666/aPath/servlet?config=abcLoginNr=%2
http://wwww04.HOSTNAME/aPath/servlet?config#here               http://wwww04.the-super.hosting.com/aPath/servlet?config#here              
http://wwww04.HOSTNAME                                         http://wwww04.the-super.hosting.com                                        
http://wwww04.HOSTNAME/                                        http://wwww04.the-super.hosting.com/                                       
http://wwww04.HOSTNAME/aPath                                   http://wwww04.the-super.hosting.com/aPath                                  
http://wwww04.HOSTNAME:1234                                    http://wwww04.the-super.hosting.com:1234                                   
http://wwww04.HOSTNAME:1234/                                   http://wwww04.the-super.hosting.com:1234/                                  
http://wwww04.HOSTNAME:1234/aPath                              http://wwww04.the-super.hosting.com:1234/aPath                             

You can be more explicit about the protocol format etc. but I'm not sure there's much point. 您可以更明确地了解协议格式等。但是我不确定这有什么意义。


The problem with your original pattern is a mix of greediness and the optional slash as the final 'or' component with the port numbers. 原始模式的问题是贪婪和可选的斜杠作为端口号的最终“或”组成部分。 You can tweak it to make it work, at least for your sample data, eg: 您可以对其进行调整,至少在您的示例数据中可以使它生效,例如:

WITH strings AS (   
  SELECT 'http://wwww11.the-super.hosting.com:9999/aPath/servlet?config=abcLoginNr=%1' s FROM dual union all   
  SELECT 'http://wwww22.the-super.hosting.com:6666/aPath/servlet?config=abcLoginNr=%2' s FROM dual union all   
  SELECT 'http://wwww22.the-super.hosting.com:6666/aPath/servlet?config=abcLoginNr=%2' s FROM dual union all   
  SELECT 'http://wwww04.the-super.hosting.com/aPath/servlet?config#here' s FROM dual   
)  
SELECT regexp_replace(s,'([[:alpha:]]+://[[:alpha:]]{4}[[:digit:]]{2}\.)(.+?)(:9999/|:6666/|/)(.+)$', '\1HOSTNAME\3\4') "MODIFIED_STRING", s "STRING"
--                                                                         ^               ^^^    ^
FROM strings;

MODIFIED_STRING                                                STRING                                                                     
-------------------------------------------------------------- ---------------------------------------------------------------------------
http://wwww11.HOSTNAME:9999/aPath/servlet?config=abcLoginNr=%1 http://wwww11.the-super.hosting.com:9999/aPath/servlet?config=abcLoginNr=%1
http://wwww22.HOSTNAME:6666/aPath/servlet?config=abcLoginNr=%2 http://wwww22.the-super.hosting.com:6666/aPath/servlet?config=abcLoginNr=%2
http://wwww22.HOSTNAME:6666/aPath/servlet?config=abcLoginNr=%2 http://wwww22.the-super.hosting.com:6666/aPath/servlet?config=abcLoginNr=%2
http://wwww04.HOSTNAME/aPath/servlet?config#here               http://wwww04.the-super.hosting.com/aPath/servlet?config#here              

but it seems like overkill. 但似乎太过分了。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM