简体   繁体   English

捕获正则表达式在URL中的斜线之间重复的字符串

[英]Capture Regex repeating string between slashes in URL

I have following partial URL that can be 我有以下部分网址,可以是

/it/xyz /test/param+1/param-2/1234/gfd4 / it / xyz / test / param + 1 / param-2 / 1234 / gfd4

Basically two letter at the beginning a slash another unknown string and then a series of repeatable strings between slashes I need to capture every string (I know a split with / delimiter would be fine but I am interested to know how can I extract with regex). 基本上在开头两个字母是一个斜杠,另一个是未知字符串,然后是一系列在斜杠之间的可重复字符串,我需要捕获每个字符串(我知道用/分隔符进行拆分会很好,但是我很想知道如何使用正则表达式提取) 。 I came out first with this: 我首先出来的是这样的:

^\/([a-zA-Z]{2})\/([a-zA-Z]{1,10})(\/[a-zA-Z1-9\+\-]+)

but it only capture 但它只能捕捉

group1: it group2: xyz group3: /test group1:it group2:xyz group3:/ test

and of course it ignores the rest of the string. 当然,它会忽略字符串的其余部分。

If I add a * sign at the end it only captures the last sentence: 如果我在末尾加一个*号,它只会捕获最后一个句子:

^\/([a-zA-Z]{2})\/([a-zA-Z]{1,10})(\/[a-zA-Z1-9\+\-]+)*

group1: it group2: xyz group3: /gfd4 group1:it group2:xyz group3:/ gfd4

So, I am obviously missing some fundamentals, so in addition to the proper regex I would like to have an explanation. 因此,我显然缺少一些基本知识,因此除了适当的正则表达式外,我还想解释一下。

I tagged as Java because the engine which parses the regex is the JDK 7. It is my knowledge that each engine may have differences. 我标记为Java是因为解析正则表达式的引擎是JDK7。据我所知,每个引擎可能会有差异。

As mentioned here , this is expected: 如此处所述 ,这是预期的:

With one group in the pattern, you can only get one exact result in that group. 模式中只有一组,您只能在该组中获得一个准确的结果。
If your capture group gets repeated by the pattern (you used the + quantifier on the surrounding non-capturing group), only the last value that matches it gets stored. 如果您的捕获组被模式重复(您在周围的非捕获组上使用了+量词),则只会存储与其匹配的最后一个值。

I would rather capture the rest of the string in group3 ( (\\/.*$) , as in this demo ), then use a split around '/'. 我宁愿在group3( (\\/.*$)捕获字符串的其余部分,如本演示中所示 ),然后在'/'周围使用拆分。 Or apply yhat pattern on the rest of the string: 或在字符串的其余部分上应用yhat模式:

Pattern p = Pattern.compile("(\/[a-zA-Z1-9\+\-]+)");
Matcher m = p.matcher(str);
while (m.find()) {
    String place = m.group(1);
    ...
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM