[英]Extracting part of URL using java regular expression
I'm trying to extract part of the URL in the text files. 我正在尝试提取文本文件中的部分URL。
for example: 例如:
/p/gnomecatalog/bugs/search/?q=status%3Aclosed-accepted+or+status%3Awont-fix+or+status%3Aclosed" class="search_bin"><span>Closed Tickets</span></a>
I would like to extract only 我只想提取
/p/gnomecatalog/bugs/search/?q=status%3Aclosed-accepted+or+status%3Awont-fix+or+status%3Aclosed
HOW I COULD DO THAT BY USING REGULAR Expression. 我如何通过使用常规表达式来做到这一点。 I tried with regex 我尝试过正则表达式
"/p/*./bugs/*."
but it didn't work. 但这没用。
Try this: 尝试这个:
"\/p.*\/bugs[^"]*"
it means: "/p" 它表示:“ / p”
then: all chars, 然后:所有字符,
then: "/bugs", 然后:“ / bugs”,
then: all chars except "
然后:除"
You can use : 您可以使用 :
(\/p\/.*\/bugs\/.*?(?="))
Java Code : Java代码:
String REGEX = "(\\/p\\/.*\\/bugs\\/.*?(?=\"))";
Pattern p = Pattern.compile(REGEX);
Matcher m = p.matcher(line);
while (m.find()) {
String matched = m.group();
System.out.println("Mached : "+ matched);
}
OUTPUT 输出值
Mached : /p/gnomecatalog/bugs/search/?q=status%3Aclosed-accepted+or+status%3Awont-fix+or+status%3Aclosed
Explanation: 说明:
Here's another way: 这是另一种方式:
(?i)/p/[a-z/]+bugs/[^ "]+
The (?i) in the beginning makes the regex case insensitive so you don't have to worry about that. 开头的(?i)使正则表达式不区分大小写,因此您不必为此担心。 Then after bugs/ it will continue until it reaches either a space or a ". 然后,在bug /之后,它将继续直到到达空格或“。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.