简体   繁体   English

HttpClient从响应获取图像

[英]HttpClient Get images from response

I'm using Apache HttpClient to perform GET/POST requests, 我正在使用Apache HttpClient执行GET / POST请求,

I was wondering if you could save the images loaded/retrieved by a response, without having to download them again with their URLs. 我想知道您是否可以保存响应加载/检索的图像,而不必再次使用其URL下载它们。

This question has been asked like one year ago, but no one answered: Can I get cached images using HttpClient? 就像一年前一样,已经问过这个问题,但是没有人回答: 我可以使用HttpClient获取缓存的图像吗?

I tried: 我试过了:

CloseableHttpClient httpclient = HttpClients.createDefault();

HttpGet httpget = new HttpGet(url);

HttpResponse response = httpclient.execute(httpget);
HttpEntity entity = response.getEntity();

InputStream is = entity.getContent();

FileOutputStream fos = new FileOutputStream(new File("img.png"));
int inByte;
while ((inByte = is.read()) != -1) {
    fos.write(inByte);
}
is.close();
fos.close();

but apparently it's downloading only text, can i make HttpClient download images of that particular URL or not? 但是显然它仅下载文本,我可以让HttpClient下载该特定URL的图像吗? Is this doable or not? 这可行吗?

A web page is just the HTML code of the page. 网页只是页面的HTML代码。

When a browser accesses a webpage, it downloads the HTML code, and then parses the HTML . 当浏览器访问网页时,它会下载HTML代码,然后解析HTML If there are things like IMG tags, embeded objects (like Flash, Applets etc.), frames and so on, the browser takes their URL, and creates a new HTTP connection, in which it downloads the image. 如果存在诸如IMG标签,嵌入式对象(如Flash,Applet等),框架等的内容,则浏览器将获取其URL,并创建一个新的HTTP连接,并在其中下载图像。 It does so for every image. 每个图像都这样做。 And then, having all the various parts of the page ready (in cache), it renders the page. 然后,准备好页面的所有各个部分(在缓存中),然后呈现页面。

This is a simplified description, of course, as browsers tend to optimize these things by keeping connections open and keeping caches around. 当然,这是一个简化的描述,因为浏览器倾向于通过保持连接打开并保持高速缓存来优化这些东西。 So to reiterate, to get the images in a page: 因此,重申一下,以获取页面中的图像:

  1. Download HTML from the given URL. 从给定的URL下载HTML。
  2. Parse the HTML and find the IMG tags. 解析HTML并找到IMG标签。
  3. For every relevant IMG, download the image data from the SRC URL associated with it. 对于每个相关的IMG,都从与其关联的SRC URL下载图像数据。 You should save them to a file. 您应该将它们保存到文件中。

It is important to understand that an HttpClient response only represents one object - the HTML page, or a single image, depending what URL you gave it. 重要的是要理解, HttpClient响应仅代表一个对象-HTML页面或单个图像,具体取决于您为其指定的URL。 If you want to download an entire page and all its images, you have to use an HttpClient for each of the objects yourself - it doesn't do so automatically. 如果要下载整个页面及其所有图像,则必须自己为每个对象使用HttpClient它不会自动下载。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM