简体   繁体   English

Java如何确定URL是http还是https?

[英]Java how to find out if a URL is http or https?

I am writing a web crawler tool in Java. 我正在用Java编写网络爬虫工具。 When I type the website name, how can I make it so that it connects to that site in http or https without me defining the protocol? 当我键入网站名称时,如何在不定义协议的情况下使它以http或https连接到该网站?

try {
   Jsoup.connect("google.com").get();
} catch (IOException ex) {
   Logger.getLogger(LinkGUI.class.getName()).log(Level.SEVERE, null, ex);
}

But I get the error: 但是我得到了错误:

java.lang.IllegalArgumentException: Malformed URL: google.com

What can I do? 我能做什么? Are there any classes or libraries that do this? 是否有任何类或库可以做到这一点?

What I'm trying to do is I have a list of 165 Courses, each with 65 - 71 html pages with links all throughout them. 我想做的是,我有165门课程的列表,每门课程都有65-71个html页面,并在整个页面中都有链接。 I am writing a Java program to test if the link is broken or not. 我正在编写一个Java程序来测试链接是否断开。

You can write your own simple method to try both protocols, like: 您可以编写自己的简单方法来尝试两种协议,例如:

static boolean usesHttps(final String urlWithoutProtocol) throws IOException {
    try {
        Jsoup.connect("http://" + urlWithoutProtocol).get();
        return false;
    } catch (final IOException e) {
        Jsoup.connect("https://" + urlWithoutProtocol).get();
        return true;
    }
}

Then, your original code can be: 然后,您的原始代码可以是:

try {
    boolean shouldUseHttps = usesHttps("google.com");
} catch (final IOException ex) {
    Logger.getLogger(LinkGUI.class.getName()).log(Level.SEVERE, null, ex);
}

Note: you should only use the usesHttps() method once per URL, to figure out which protocol to use. 注意:每个网址仅应使用一次 useHttps()方法,以确定要使用的协议。 After you know that, you should connect using Jsoup.connect() directly. 知道之后,您应该直接使用Jsoup.connect()进行连接。 This will be more efficient. 这样会更有效率。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM