简体   繁体   English

来自URL的InputStream

[英]InputStream from a URL

How do I get an InputStream from a URL? 如何从URL获取InputStream?

for example, I want to take the file at the url wwww.somewebsite.com/a.txt and read it as an InputStream in Java, through a servlet. 例如,我想在url wwww.somewebsite.com/a.txt上获取该文件,并通过servlet将其作为Java中的InputStream读取。

I've tried 我试过了

InputStream is = new FileInputStream("wwww.somewebsite.com/a.txt");

but what I got was an error: 但我得到的是一个错误:

java.io.FileNotFoundException

Use java.net.URL#openStream() with a proper URL (including the protocol!). 使用带有正确URL的java.net.URL#openStream() (包括协议!)。 Eg 例如

InputStream input = new URL("http://www.somewebsite.com/a.txt").openStream();
// ...

See also: 也可以看看:

尝试:

final InputStream is = new URL("http://wwww.somewebsite.com/a.txt").openStream();

(a) wwww.somewebsite.com/a.txt isn't a 'file URL'. (a) wwww.somewebsite.com/a.txt不是'文件网址'。 It isn't a URL at all. 它根本不是URL。 If you put http:// on the front of it it would be an HTTP URL, which is clearly what you intend here. 如果你把http://放在它的前面,它将是一个HTTP URL,这显然是你想要的。

(b) FileInputStream is for files, not URLs. (b) FileInputStream用于文件,而不是URL。

(c) The way to get an input stream from any URL is via URL.openStream(), or URL.getConnection().getInputStream(), which is equivalent but you might have other reasons to get the URLConnection and play with it first. (三)从任意 URL获得的输入流的方法是通过URL.openStream(),URL.getConnection().getInputStream(),这相当于,但你可能有其他原因,得到URLConnection先用它玩。

Your original code uses FileInputStream, which is for accessing file system hosted files. 您的原始代码使用FileInputStream,用于访问文件系统托管文件。

The constructor you used will attempt to locate a file named a.txt in the www.somewebsite.com subfolder of the current working directory (the value of system property user.dir). 您使用的构造函数将尝试在当前工作目录的www.somewebsite.com子文件夹中找到名为a.txt的文件(系统属性user.dir的值)。 The name you provide is resolved to a file using the File class. 您提供的名称使用File类解析为文件。

URL objects are the generic way to solve this. URL对象是解决此问题的通用方法。 You can use URLs to access local files but also network hosted resources. 您可以使用URL访问本地文件,也可以使用网络托管资源。 The URL class supports the file:// protocol besides http:// or https:// so you're good to go. 除了http://或https://之外,URL类还支持file://协议,因此您可以继续使用。

Pure Java: 纯Java:

 urlToInputStream(url,httpHeaders);

With some success I use this method. 有了一些成功,我使用这种方法。 It handles redirects and one can pass a variable number of HTTP headers as Map<String,String> . 处理重定向,并且可以传递可变数量的HTTP头作为Map<String,String> It also allows redirects from HTTP to HTTPS . 它还允许从HTTP重定向到HTTPS

private InputStream urlToInputStream(URL url, Map<String, String> args) {
    HttpURLConnection con = null;
    InputStream inputStream = null;
    try {
        con = (HttpURLConnection) url.openConnection();
        con.setConnectTimeout(15000);
        con.setReadTimeout(15000);
        if (args != null) {
            for (Entry<String, String> e : args.entrySet()) {
                con.setRequestProperty(e.getKey(), e.getValue());
            }
        }
        con.connect();
        int responseCode = con.getResponseCode();
        /* By default the connection will follow redirects. The following
         * block is only entered if the implementation of HttpURLConnection
         * does not perform the redirect. The exact behavior depends to 
         * the actual implementation (e.g. sun.net).
         * !!! Attention: This block allows the connection to 
         * switch protocols (e.g. HTTP to HTTPS), which is <b>not</b> 
         * default behavior. See: https://stackoverflow.com/questions/1884230 
         * for more info!!!
         */
        if (responseCode < 400 && responseCode > 299) {
            String redirectUrl = con.getHeaderField("Location");
            try {
                URL newUrl = new URL(redirectUrl);
                return urlToInputStream(newUrl, args);
            } catch (MalformedURLException e) {
                URL newUrl = new URL(url.getProtocol() + "://" + url.getHost() + redirectUrl);
                return urlToInputStream(newUrl, args);
            }
        }
        /*!!!!!*/

        inputStream = con.getInputStream();
        return inputStream;
    } catch (Exception e) {
        throw new RuntimeException(e);
    }
}

Full example call 完整的示例电话

private InputStream getInputStreamFromUrl(URL url, String user, String passwd) throws IOException {
        String encoded = Base64.getEncoder().encodeToString((user + ":" + passwd).getBytes(StandardCharsets.UTF_8));
        Map<String,String> httpHeaders=new Map<>();
        httpHeaders.put("Accept", "application/json");
        httpHeaders.put("User-Agent", "myApplication");
        httpHeaders.put("Authorization", "Basic " + encoded);
        return urlToInputStream(url,httpHeaders);
    }

Here is a full example which reads the contents of the given web page. 这是一个完整的示例,它读取给定网页的内容。 The web page is read from an HTML form. 从HTML表单中读取网页。 We use standard InputStream classes, but it could be done more easily with JSoup library. 我们使用标准的InputStream类,但使用JSoup库可以更轻松地完成它。

<dependency>
    <groupId>javax.servlet</groupId>
    <artifactId>javax.servlet-api</artifactId>
    <version>3.1.0</version>
    <scope>provided</scope>

</dependency>

<dependency>
    <groupId>commons-validator</groupId>
    <artifactId>commons-validator</artifactId>
    <version>1.6</version>
</dependency>  

These are the Maven dependencies. 这些是Maven依赖项。 We use Apache Commons library to validate URL strings. 我们使用Apache Commons库来验证URL字符串。

package com.zetcode.web;

import com.zetcode.service.WebPageReader;
import java.io.IOException;
import java.nio.charset.StandardCharsets;
import javax.servlet.ServletException;
import javax.servlet.ServletOutputStream;
import javax.servlet.annotation.WebServlet;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;

@WebServlet(name = "ReadWebPage", urlPatterns = {"/ReadWebPage"})
public class ReadWebpage extends HttpServlet {

    @Override
    protected void doGet(HttpServletRequest request, HttpServletResponse response)
            throws ServletException, IOException {

        response.setContentType("text/plain;charset=UTF-8");

        String page = request.getParameter("webpage");

        String content = new WebPageReader().setWebPageName(page).getWebPageContent();

        ServletOutputStream os = response.getOutputStream();
        os.write(content.getBytes(StandardCharsets.UTF_8));
    }
}

The ReadWebPage servlet reads the contents of the given web page and sends it back to the client in plain text format. ReadWebPage servlet读取给定网页的内容,并以纯文本格式将其发送回客户端。 The task of reading the page is delegated to WebPageReader . 读取页面的任务委托给WebPageReader

package com.zetcode.service;

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.net.URL;
import java.nio.charset.StandardCharsets;
import java.util.logging.Level;
import java.util.logging.Logger;
import java.util.stream.Collectors;
import org.apache.commons.validator.routines.UrlValidator;

public class WebPageReader {

    private String webpage;
    private String content;

    public WebPageReader setWebPageName(String name) {

        webpage = name;
        return this;
    }

    public String getWebPageContent() {

        try {

            boolean valid = validateUrl(webpage);

            if (!valid) {

                content = "Invalid URL; use http(s)://www.example.com format";
                return content;
            }

            URL url = new URL(webpage);

            try (InputStream is = url.openStream();
                    BufferedReader br = new BufferedReader(
                            new InputStreamReader(is, StandardCharsets.UTF_8))) {

                content = br.lines().collect(
                      Collectors.joining(System.lineSeparator()));
            }

        } catch (IOException ex) {

            content = String.format("Cannot read webpage %s", ex);
            Logger.getLogger(WebPageReader.class.getName()).log(Level.SEVERE, null, ex);
        }

        return content;
    }

    private boolean validateUrl(String webpage) {

        UrlValidator urlValidator = new UrlValidator();

        return urlValidator.isValid(webpage);
    }
}

WebPageReader validates the URL and reads the contents of the web page. WebPageReader验证URL并读取网页的内容。 It returns a string containing the HTML code of the page. 它返回一个包含页面HTML代码的字符串。

<!DOCTYPE html>
<html>
    <head>
        <title>Home page</title>
        <meta charset="UTF-8">
    </head>
    <body>
        <form action="ReadWebPage">

            <label for="page">Enter a web page name:</label>
            <input  type="text" id="page" name="webpage">

            <button type="submit">Submit</button>

        </form>
    </body>
</html>

Finally, this is the home page containing the HTML form. 最后,这是包含HTML表单的主页。 This is taken from my tutorial about this topic. 这取自我关于此主题的教程

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM