简体   繁体   English

正则表达式以匹配具有6个或更多级别的URL

[英]Regular expression to match a URL with 6 or more levels

I am trying to match a URL with 6 or more than 6 levels or sub-paths 我正在尝试匹配具有6个或6个以上级别或子路径的URL

http://www.domain.com/level1/level2/level3/level4/level5/level6/level7/level8/level9/level10/level11/level12.html

I came up with an expression 我想出了一个表达

^http:\/\/([a-zA-Z\.-]*)\W(\b\w+\b) 

...which matches level1 ( demo ) ...匹配level1( demo

However, when I am trying to match a URL with six or more levels it doesn't seem to work. 但是,当我尝试将URL与六个或更多个级别进行匹配时,它似乎不起作用。

^http:\/\/([a-zA-Z\.-]*)\W(\b\w+\b){6,}

( demo ) 演示

Try the following: 请尝试以下操作:

^http:\/\/([a-zA-Z\.-]*)(\/[\w\.]+){6,}

http://rubular.com/r/QZlidUqheq http://rubular.com/r/QZlidUqheq

I think this is what you were trying for: 我认为这是您想要的:

^http://([a-zA-Z.-]+)/(?:[^/]+/){6,}.*$

This matches six or more levels, which is what you said you wanted in the question. 这匹配六个或更多级别,这是您在问题中说的想要的。 However in the question's title you phrased it "more than six". 但是,在问题的标题中,您用“超过六个”表示。 If that's what you really want, change the quantifier from {6,} to {7,} . 如果这是您真正想要的,请将量词从{6,}更改为{7,}

On a side note, the forward slash ( / ) has no special meaning in regexes, and doesn't need to be escaped. 另外,正斜杠( / )在正则表达式中没有特殊含义,并且不需要转义。 Rubular forces you to escape the slash because that's what it uses as the regex delimiter. Rubular迫使您逃脱斜线,因为这就是正则表达式定界符。 Nutch uses Java's built-in regexes, so you should use a tester that the same flavor, like this one . Nutch的使用Java内置的正则表达式,所以你应该使用一个测试,同样的味道,像这一个

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM