简体   繁体   English

Java 正则表达式从 URL 中提取主机名和域名

[英]Java regex to extract host name and domain name from a URL

I have already viewed and tried multiple other threads and doesn't work for me.我已经查看并尝试了多个其他线程,但对我不起作用。 I need the regex solution for it to work and no java code that does it without regex.我需要正则表达式解决方案才能使其工作,并且没有 java 代码可以在没有正则表达式的情况下执行此操作。

Some of the threads which I have already checked: Get domain name from given url , Extract host name/domain name from URL string , and Java regex to extract domain name?我已经检查过的一些线程: Get domain name from given url , Extract host name/domain name from URL stringZD52387880E1EA22817A72D375921381 提取域名? None work for me, either the regex doesn't work or the solution is a java code without regex.没有一个对我有用,要么正则表达式不起作用,要么解决方案是没有正则表达式的 java 代码。

What I am trying to do?我想做什么?

Case 1:
Input: https://api.twitter.com/blog/category/2?user=42&status=enabled
Output: api.twitter.com

Input: abc.xyz.com/blog/category/2?user=42&status=enabled
Output: abc.xyz.com

Case 2:
Input: https://abc.xyz.com/blog/category/2?user=42&status=enabled
Output: xyz.com

Input: abc.xyz.com/blog/category/2?user=42&status=enabled
Output: xyz.com

I need 2 regexes to solve each case mentioned above.我需要 2 个正则表达式来解决上述每种情况。 If it can be done in one, even that works.如果可以一次性完成,即使这样也行。

I tried the below regex from the first post:我从第一篇文章中尝试了以下正则表达式:

^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?

This one works when there is https:// or any scheme but fails when there is no scheme in the URL.当有https://或任何方案但当 URL 中没有方案时,此方案有效。

So far I am solving the first case using a 2 step solution.到目前为止,我正在使用两步解决方案解决第一个案例。

Step 1: Replace scheme
(.*://)(.*) -> $2
remove anything before and including string "://"

Step 2: Extract host name
([^/]*)(.*) -> $1
The first group extracts everything that is before the first "/". Basically extracting everything that isn't slash till I see the first one. 

You may use this regex with optional matches and capture groups:您可以将此正则表达式与可选匹配项和捕获组一起使用:

^(?:\w+://)?((?:[^./?#]+\.)?([^/?#]+))

RegEx Demo正则表达式演示

RegEx Details:正则表达式详细信息:

  • ^ : Start ^ : 开始
  • (?:\w+://)? : Optionally match scheme names followed by :// : 可选匹配方案名称,后跟://
  • ( : Start capture group #1 ( : 开始捕获组 #1
    • (?:[^./?#]+\.)? : Optionally match first part of domain name using a non-capture group :可选地使用非捕获组匹配域名的第一部分
    • ([^/?#]+) : Match 1+ of any character that is not / , ? ([^/?#]+) :匹配任何不是/的字符的 1+ , ? , # in capture group #2 , #在捕获组 #2
  • ) : End capture group #1 ) : 结束捕获组 #1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM