[英]Matching subdomain and top domain using regex in Java
Follow up of this question Regex to match pattern with subdomain in java 跟踪此问题正则表达式以将模式与Java中的子域匹配
I use the below pattern to match the domain and subdomain 我使用以下模式来匹配域和子域
Pattern pattern = Pattern.compile("http://([a-z0-9]*.)example.com");
this pattern matches the following 该模式匹配以下内容
http://asd.example.com
http://example.example.com
http://www.example.com
but it is not matching 但不匹配
http://example.com
Can any one tell me how to match http://example.com
too? 谁能告诉我如何搭配
http://example.com
?
Just make the first part optional with a ?
只需使第一部分为可选
?
: :
Pattern pattern = Pattern.compile("http://([a-z0-9]*\\.)?example\\.com");
Note that .
注意
.
matches any character, you should use \\\\.
匹配任何字符,您应该使用
\\\\.
to match a literal dot. 匹配文字点。
You can use this regex pattern to get domains of all urls: 您可以使用此正则表达式模式来获取所有网址的域:
\\p{L}{0,10}(?:://)?[\\p{L}\\.]{1,50}
For example; 例如;
Input = http://www.google.com/search?q=a
Output = http://www.google.com
Input = ftp://www.google.com/search?q=a
Output = ftp://www.google.com
Input = www.google.com/search?q=a
Output = www.google.com
Here, \\p{L}{0,10} stands for the http, https and ftp parts (there could be some more i don't know), (?:://)? 在这里,\\ p {L} {0,10}代表http,https和ftp部分(可能还有一些我不知道的部分),(?:: //)? stands for :// part if appears, [\\p{L}\\.]{1,50} stands for the foo.bar.foo.com part.
代表://部分(如果出现),[\\ p {L} \\。] {1,50}代表foo.bar.foo.com部分。 The rest of the url is cut out.
其余的网址被删除。
And here is the java code that accomplises the job: 这是完成这项工作的Java代码:
public static final String DOMAIN_PATTERN = "\\p{L}{0,10}(?:://)?[\\p{L}\\.]{1,50}";
public static String getDomain(String url) {
if (url == null || url.equals("")) {
return "";
}
Pattern p = Pattern.compile(DOMAIN_PATTERN);
Matcher m = p.matcher(url);
if (m.find()) {
return m.group();
}
return "";
}
public static void main(String[] args) {
System.out.println(getDomain("www.google.com/search?q=a"));
}
Output = www.google.com
Finally, if you want to match just "example.com" you can simply add it to the end of the pattern like : 最后,如果您只想匹配“ example.com”,则可以将其添加到模式的末尾,例如:
\\p{L}{0,10}(?:://)?[\\p{L}\\.]{0,50}example\\.com
And this will get all of the domains with "example.com": 这将使用“ example.com”获取所有域:
Input = http://www.foo.bar.example.com/search?q=a
Output = http://www.foo.bar.example.com
Note : Note that \\p{Ll} can be used instead of \\p{L} because \\p{Ll} catches lowercase unicode letters (\\p{L} all kind of unicode letters) and urls are constructed of lowercase letters. 注意:请注意,可以使用\\ p {Ll}代替\\ p {L},因为\\ p {Ll}捕获小写的unicode字母(\\ p {L}各种unicode字母),并且url由小写字母构成。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.