简体   繁体   English

如何从Java登录和下载https网页中的文件?

[英]How do I login and download a file from a https web page from Java?

I have to login into a https web page and download a file using Java. 我必须登录https网页并使用Java下载文件。 I know all the URLs beforehand: 我事先知道所有的URL:

baseURL = // a https URL;
urlMap = new HashMap<String, URL>();
urlMap.put("login", new URL(baseURL, "exec.asp?login=username&pass=XPTO"));
urlMap.put("logout", new URL(baseURL, "exec.asp?exec.asp?page=999"));
urlMap.put("file", new URL(baseURL, "exec.asp?file=111"));

If I try all these links in a web browser like firefox, they work. 如果我在像Firefox这样的网络浏览器中尝试所有这些链接,它们都能正常工作。

Now when I do: 现在我做的时候:

urlConnection = urlMap.get("login").openConnection();
urlConnection.connect();
BufferedReader in = new BufferedReader(
    new InputStreamReader(urlConnection.getInputStream()));
String inputLine;
while ((inputLine = in.readLine()) != null)
    System.out.println(inputLine);
in.close();

I just get back the login page HTML again, and I cannot proceed to file download. 我刚刚再次返回登录页面HTML,我无法进行文件下载。

Thanks! 谢谢!

I agree with Alnitak that the problem is likely storing and returning cookies. 我同意Alnitak的观点,即问题可能是存储和返回cookie。

Another good option I have used is HttpClient from Jakarta Commons. 我使用的另一个好选择是来自Jakarta Commons的HttpClient

It's worth noting, as an aside, that if this is a server you control, you should be aware that sending the username and password as querystrings is not secure (even if you're using HTTPS). 另外值得注意的是,如果这是您控制的服务器,您应该知道将用户名和密码作为查询字符串发送是不安全的(即使您使用的是HTTPS)。 HttpClient supports sending parameters using POST, which you should consider. HttpClient支持使用POST发送参数,您应该考虑这些参数。

As has been noted, you must maintain the session cookie between requests (see CookieHandler ). 如前所述,您必须在请求之间维护会话cookie(请参阅CookieHandler )。

Here is a sample implementation: 这是一个示例实现:

class MyCookieHandler extends CookieHandler {

    private Map<String, List<String>> cookies = new HashMap<String, List<String>>();

    @Override
    public Map<String, List<String>> get(URI uri,
            Map<String, List<String>> requestHeaders) throws IOException {
        String host = uri.getHost();
        Map<String, List<String>> ret = new HashMap<String, List<String>>();
        synchronized (cookies) {
            List<String> store = cookies.get(host);
            if (store != null) {
                store = Collections.unmodifiableList(store);
                ret.put("Cookie", store);
            }
        }

        return Collections.unmodifiableMap(ret);
    }

    @Override
    public void put(URI uri, Map<String, List<String>> responseHeaders)
            throws IOException {
        List<String> newCookies = responseHeaders.get("Set-Cookie");
        if (newCookies != null) {
            String host = uri.getHost();
            synchronized (cookies) {
                List<String> store = cookies.get(host);
                if (store == null) {
                    store = new ArrayList<String>();
                    cookies.put(host, store);
                }
                store.addAll(newCookies);
            }
        }
    }

}

Notwithstanding that you may have some other problem that's preventing the login request from getting you logged in, it's unlikely that you'll be able to proceed to the download page unless you store and return any cookies that the login page generates. 尽管您可能遇到其他一些阻止登录请求登录的问题,但除非您存储并返回登录页面生成的任何cookie,否则您无法继续下载页面。

That's because HTTP itself is stateless, so in your current code there's no way for the remote server to tell that the second download request is from the same user that just logged in. 这是因为HTTP本身是无状态的,因此在您当前的代码中,远程服务器无法告知第二个下载请求来自刚刚登录的同一用户。

I'd say have a look at Java CURL http://sourceforge.net/projects/javacurl . 我想看看Java CURL http://sourceforge.net/projects/javacurl I have used it before to login into an https website and download stuff, it has features such as spoofing the browser id etc. Which might solve your issue of getting redirected back to login. 我之前使用它登录https网站并下载东西,它具有欺骗浏览器ID等功能。这可能会解决您重定向回登录的问题。

Although they provide an eclipse plugin for it I have used it without and it works fine. 虽然他们为它提供了一个eclipse插件,但我没有使用它,它工作正常。

Alternatively you could use wget and call it from java. 或者你可以使用wget并从java中调用它。

Perhaps you want to try HttpUnit . 也许你想尝试HttpUnit Although written with testing of websites in mind it may be usable for your problem. 尽管在编写网站测试时,它可能适用于您的问题。

From their website: 从他们的网站:

"... Written in Java, HttpUnit emulates the relevant portions of browser behavior, including form submission, JavaScript, basic http authentication, cookies and automatic page redirection, and allows Java test code to examine returned pages either as text, an XML DOM, or containers of forms, tables, and links." “...用Java编写,HttpUnit模拟浏览器行为的相关部分,包括表单提交,JavaScript,基本http身份验证,cookie和自动页面重定向,并允许Java测试代码以文本,XML DOM的形式检查返回的页面,或表格,表格和链接的容器。“

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM