简体   繁体   English

使用Java正则表达式提取部分URL

[英]Extracting part of URL using java regular expression

I'm trying to extract part of the URL in the text files. 我正在尝试提取文本文件中的部分URL。

for example: 例如:

/p/gnomecatalog/bugs/search/?q=status%3Aclosed-accepted+or+status%3Awont-fix+or+status%3Aclosed" class="search_bin"><span>Closed Tickets</span></a> 

I would like to extract only 我只想提取

 /p/gnomecatalog/bugs/search/?q=status%3Aclosed-accepted+or+status%3Awont-fix+or+status%3Aclosed 

HOW I COULD DO THAT BY USING REGULAR Expression. 我如何通过使用常规表达式来做到这一点。 I tried with regex 我尝试过正则表达式

  "/p/*./bugs/*." 

but it didn't work. 但这没用。

Try this: 尝试这个:

   "\/p.*\/bugs[^"]*"

it means: "/p" 它表示:“ / p”

then: all chars, 然后:所有字符,

then: "/bugs", 然后:“ / bugs”,

then: all chars except " 然后:除"

You can use : 您可以使用 :

(\/p\/.*\/bugs\/.*?(?="))

Java Code : Java代码:

        String REGEX = "(\\/p\\/.*\\/bugs\\/.*?(?=\"))";
        Pattern p = Pattern.compile(REGEX);
        Matcher m = p.matcher(line);
        while (m.find()) {
                String matched = m.group();
                System.out.println("Mached :  "+ matched);

            }

OUTPUT 输出值

Mached :  /p/gnomecatalog/bugs/search/?q=status%3Aclosed-accepted+or+status%3Awont-fix+or+status%3Aclosed

DEMO 演示

Explanation: 说明: 在此处输入图片说明

Here's another way: 这是另一种方式:

(?i)/p/[a-z/]+bugs/[^ "]+

The (?i) in the beginning makes the regex case insensitive so you don't have to worry about that. 开头的(?i)使正则表达式不区分大小写,因此您不必为此担心。 Then after bugs/ it will continue until it reaches either a space or a ". 然后,在bug /之后,它将继续直到到达空格或“。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM