Java：如何获取“ http：//”和“ /”之后的第一个之间的文本？在第一次出现“ /”之后？

Question

I am still a novice with regular expressions, "regex", etc... in Java. 我仍然是Java的正则表达式，“ regex”等的新手。

If I have an url like this : " http://somedomain.someextention/somefolder/.../someotherfolder/somepage " 如果我有这样的网址：“ http：//somedomain.someextention/somefolder/.../someotherfolder/somepage ”

What is the simplest way to get : 最简单的方法是：

"somedomain.someextention" ? “ somedomain.someextension”？
"somefolder/.../someotherfolder/somepage" ? “ somefolder /.../ someotherfolder / somepage”？
"somepage" ? “某网页”？

Thanks ! 谢谢！

Answer 1

You don't have to (and probably shouldn't) use regex here. 您不必（也许不应该）在这里使用正则表达式。 Instead use classes defined to handle things like this. 而是使用定义的类来处理此类事情。 You can use for example URL , URI , File classes like 例如，您可以使用URL ， URI ， File类

String address = "http://somedomain.someextention/somefolder/.../someotherfolder/somepage";

URL url = new URL(address);
File file = new File(url.getPath());

System.out.println(url.getHost());
System.out.println(url.getPath());
System.out.println(file.getName());

Outpit: 出站：

somedomain.someextention
/somefolder/.../someotherfolder/somepage
somepage

Now you can need to get rid of / at start of path to your resource. 现在，您需要在资源路径的开头删除/ 。 You can use substring(1) here if resource starts with / . 如果资源以/开头，则可以在此处使用substring(1) 。

But if you really must use regex you can try with 但是，如果您真的必须使用正则表达式，可以尝试

^https?://([^/]+)/(.*/([^/]+))$

Now 现在

group 1 will contain host name, 组1将包含主机名，
group 2 will contain path to resource 第2组将包含资源的路径
group 3 will contain name of resource 第3组将包含资源名称

Answer 2

The best way to get those components is to use the URI class; 获取这些组件的最佳方法是使用URI类。 eg 例如

    URI uri = new URI(str);
    String domain = uri.getHost();
    String path = uri.getPath();
    int pos = path.lastIndex("/");
    ...
    // or use File to parse the path string.

You could do it using regexes on the raw url string, but there is a risk that you won't correctly cope with all of the variability that is possible in a URL. 您可以使用原始url字符串上的正则表达式来完成此操作，但是存在无法正确处理URL中可能存在的所有可变性的风险。 (Hint: the regex supplied by @Pchenko doesn't :-)) And you would definitely need to use a decoder to deal with possible percent encoding. （提示：@Pchenko提供的正则表达式不是:-)）并且您肯定需要使用解码器来处理可能的百分比编码。

Answer 3

This is not a regexp or URI use but simple substring code as an excersise material. 这不是正则表达式或URI使用，而是简单的子字符串代码作为练习材料。 Missing few corner case format validation. 缺少一些特殊情况的格式验证。

int lastDelim = str.lastIndexOf('/);
if (lastDelim<0) throw new IllegalArgumentException("Invalid url");
int startIdx = str.indexOf("//");
startIdx = startIdx<0 ? 0 : startIdx+2;
int pathDelim = str.indexOf('/', startIdx);
String domain = str.substring(startIdx, pathDelim);
String path = str.substring(pathDelim+1, lastDelim);
String page = str.substring(lastDelim+1);

Answer 4

If you would like to use regex to decode the URL instead of using the URI class, as described in the previous answers, the below link gives a nice tutorial of regex, and it explains decoding a sample URL as well. 如果您想使用正则表达式来解码URL而不是使用URI类（如先前的答案中所述），则下面的链接提供了一个很好的正则表达式教程，并且还解释了如何解码示例URL。 You could learn it there and try it out. 您可以在那里学习并尝试。

http://www.beedub.com/book/2nd/regexp.doc.html http://www.beedub.com/book/2nd/regexp.doc.html

Answer 5

It's not regex, or scalable at that, it works though: 它不是正则表达式，也不是可扩展的，但是可以：

public class SomeClass
{
    public static void main(String[] args)
    {

        SomeClass sclass = new SomeClass();
        String[] string = 
            sclass.parseURL("http://somedomain.someextention/somefolder/.../someotherfolder/somepage");

        System.out.println(string[0]);
        System.out.println(string[1]);
        System.out.println(string[2]);
    }

    private String[] parseURL(String url)
    {
        String part1 = url.substring("http://".length(), url.indexOf("/", "http://".length()));

        String part2 = url.substring("http://".length() + part1.length() + 1, url.lastIndexOf("/"));

        String part3 = url = url.substring(url.lastIndexOf("/") + 1);

        return new String[] { part1, part2, part3 };
    }
}

Output: 输出：

somedomain.someextention
somefolder/.../someotherfolder
somepage

Java：如何获取“ http：//”和“ /”之后的第一个之间的文本？在第一次出现“ /”之后？

问题描述

5 个解决方案

解决方案1
4 2014-03-08 10:03:12

解决方案2
3 2014-03-08 10:05:13

解决方案3
0 2014-03-08 10:09:04

解决方案4
0 2014-03-08 10:09:30

解决方案5
0 2014-03-08 15:02:17

Java：如何获取“ http：//”和“ /”之后的第一个之间的文本？ 在第一次出现“ /”之后？

问题描述

5 个解决方案

解决方案1 4 2014-03-08 10:03:12

解决方案2 3 2014-03-08 10:05:13

解决方案3 0 2014-03-08 10:09:04

解决方案4 0 2014-03-08 10:09:30

解决方案5 0 2014-03-08 15:02:17

Java：如何获取“ http：//”和“ /”之后的第一个之间的文本？在第一次出现“ /”之后？

解决方案1
4 2014-03-08 10:03:12

解决方案2
3 2014-03-08 10:05:13

解决方案3
0 2014-03-08 10:09:04

解决方案4
0 2014-03-08 10:09:30

解决方案5
0 2014-03-08 15:02:17