增強正則表達式以匹配更多URL

Question

考慮此正則表達式：

  static String AdrPattern="(?:http://www\\.([^/&]+)\\.com/|(?!^)\\G)/?([^/]+)";

我有兩個小問題：

如何使其與僅具有域名的URL匹配，而沒有其他路徑/段？ （例如https://stackoverflow.com ）
如何使此正則表達式匹配具有不同域擴展名的URL？

PS ：正則表達式是從這里獲取的，並且可以正常工作，但是這兩個缺點應該得到解決。

編輯

根據以下代碼，對此帖子的回答將跳過其他部分，僅顯示域名 ：

         static String AdrPattern= "(?:(?!\\A)\\G(?:/([^\\s/]+))|http://www\\.([^\\s/&]+)\\.(?:com|net|gov|org)(?:/([^\\s/]+))?)";
         static Pattern WebUrlPattern = Pattern.compile (AdrPattern);
         WebUrlMatcher= WebUrlPattern.matcher(line);



        int cn=0;
        while(WebUrlMatcher.find()) {

    if (cnt == 0) 
        {
           String extractedPath = WebUrlMatcher.group(1);

           if(extractedPath!=null){

            fop.write(prefix.toLowerCase().getBytes());


            fop.write(System.getProperty("line.separator").getBytes());



            }

  if(extractedPath!=null)
  {
                fop.write(extractedPath.toLowerCase().getBytes());

                fop.write(System.getProperty("line.separator").getBytes());
  }        

       String extractedPart = WebUrlMatcher.group(2);
       String extractedPart = WebUrlMatcher.group(2);
   String extracted2=WebUrlMatcher.group(3);
   if(extractedPart!=null)
   {
            fop.write(extractedPart.toLowerCase().getBytes());       
            fop.write(System.getProperty("line.separator").getBytes());

            if(extracted2!=null)
            {
            fop.write(extracted2.toLowerCase().getBytes());
            fop.write(System.getProperty("line.separator").getBytes());
            }

   cnt = cnt + 1;

   }
}
    }

    }

Answer 1

這是一種方法。 對當前正則表達式的輕微操作。
只需測試捕獲組。

 "(?:(?!\\A)\\G(?:/([^\\s/]+))|http://www\\.([^\\s/&]+)\\.(?:com|net)(?:/([^\\s/]+))?)"

 (?:
      (?! \A )                      # Not BOS
      \G                            # Start from last match
      (?:
           /  
           ( [^\s/]+ )                   # (1), Required Next Segment path (or fail)
      )
   |                              # or,
      http://www\.                  # New match
      ( [^\s/&]+ )                  # (2), Domain
      \.
      (?: com | net )               # Extension
      (?:
           /  
           ( [^\s/]+ )                   # (3), Optional First Segment path
      )?
 )

測試捕獲的-

輸入：

http://www.asfdasdf.net/  
http://www.asfdasdf.net/first  
http://www.asfdasdf.net/first/second

輸出：

 **  Grp 0 -  ( pos 0 , len 23 ) 
http://www.asfdasdf.net  
 **  Grp 1 -  NULL 
 **  Grp 2 -  ( pos 11 , len 8 ) 
asfdasdf  
 **  Grp 3 -  NULL 

-------------

 **  Grp 0 -  ( pos 28 , len 29 ) 
http://www.asfdasdf.net/first  
 **  Grp 1 -  NULL 
 **  Grp 2 -  ( pos 39 , len 8 ) 
asfdasdf  
 **  Grp 3 -  ( pos 52 , len 5 ) 
first  

-------------

 **  Grp 0 -  ( pos 61 , len 29 ) 
http://www.asfdasdf.net/first  
 **  Grp 1 -  NULL 
 **  Grp 2 -  ( pos 72 , len 8 ) 
asfdasdf  
 **  Grp 3 -  ( pos 85 , len 5 ) 
first  

-------------

 **  Grp 0 -  ( pos 90 , len 7 ) 
/second  
 **  Grp 1 -  ( pos 91 , len 6 ) 
second  
 **  Grp 2 -  NULL 
 **  Grp 3 -  NULL

增強正則表達式以匹配更多URL

問題描述

1 個解決方案

解決方案1
1

增強正則表達式以匹配更多URL

問題描述

1 個解決方案

解決方案1 1

解決方案1
1