简体   繁体   English

如何使用正则表达式在java中查找url模式

[英]How to find a url pattern in java using regex

I want to find out if a given string (that represent a url) is from the same sub domain. 我想知道给定的字符串(代表一个url)是否来自同一个子域。 For example, http://www.myDomain.com/someThing with the combination of myDomain.com will return true. 例如, http ://www.myDomain.com/someThing与myDomain.com的组合将返回true。 So will the following: 以下是:

http://myDomain.com ; http://myDomain.com ; http://www.domain.myDomain.com ; http://www.domain.myDomain.com ;

But the next (illeagal) url will not - 'http://.myDomain.com' (note the dot before myDomain) 但下一个(非法的)网址不会 - “http://.myDomain.com”(注意myDomain之前的点)

Basically, I need a regex that represent whatever before myDomain.com - which in general needs to be (http|https)://[az.] myDomain - which mean that just before myDomain.com there might be letters followed by dot (0 or more times) - but if there are no letters, there shouldn't be dot as well. 基本上,我需要一个代表myDomain.com之前的正则表达式 - 一般需要(http | https):// [az。] myDomain - 这意味着在myDomain.com之前可能会有字母后跟点( 0次或更多次) - 但如果没有字母,也不应该有点。

Does anyone know how to assemble that regex? 有谁知道如何组装正则表达式?

http(s)?://([a-z]+\.)*myDomain\.com

It can be done with a combination of the URL class and a regular expression: 它可以通过URL类和正则表达式的组合来完成:

    String url = "myDomain.com";
    String[] urlTest = {
        "http://www.myDomain.com/someThing",
        "http://myDomain.com",
        "http://www.domain.myDomain.com",
        "http://.myDomain.com",
        "http://example.com"

    };
    for (String urlx : urlTest) {
        System.out.print(urlx + "\t");
        try {
            URL u = new URL(urlx);
            String host = u.getHost();
            System.out.print("HOST=" + host + "\t");
            Matcher m = Pattern.compile("(.+\\.)?myDomain\\.com").matcher(host);
            System.out.println(m.matches());

        } catch (MalformedURLException ex) {
            System.out.println("false (no valid url)");
        }
    }

Putting example here: 举个例子:

Pattern aPattern = Pattern.compile("https://example.com[^\"<$\n \\[\\])]+", 
Pattern.MULTILINE);
            Matcher aMatcher = aPattern.matcher(Big String);
while (aMatcher.find()) {
logger.info(aMatcher.group());
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM