简体   繁体   English

使用twebbrowser在delphi中从网站抓取图像

[英]Scraping images from Website in delphi with twebbrowser

im trying to make a small tool which downloads all images from the site visited. 我试图制作一个小的工具,从访问的网站下载所有图像。 It have to be made with twebbrowser component. 它必须由twebbrowser组件制成。 The test site from my customer is Click . 我的客户的测试站点是Click At the moment im selecting the pictures with getelementbyid but some of the pictures dont have a id. 目前,我正在选择带有getelementbyid的图片,但其中一些图片没有ID。 How can i adress the missing ones? 我该如何解决失踪者? Thanks alot 非常感谢

After the page is loaded, query the TWebBrowser.Document property for the IHTMLDocument2 interface, and then you can enumerate the elements of the IHTMLDocument2.images collection: 页面加载后,查询TWebBrowser.Document的财产IHTMLDocument2接口,然后你可以枚举的元素IHTMLDocument2.images集合:

var
  Document: IHTMLDocument2;
  Images: IHTMLElementCollection;
  Image: IHTMLImgElement;
  I: Integer;
begin
  Document := WebBrowser1.Document as IHTMLDocument2;
  Images := Document.images;
  For I := 0 to Images.length - 1 do
  begin
    Image := Images.item(I, '') as IHTMLImgElement;
    // use Image as needed...
  end;
end;

Note that this will only find images in HTML <img> tags. 请注意,这只会在HTML <img>标记中找到图像。 If you need to find images in <input type="image"> tags as well, you will have to enumerate the elements of the IHTMLDocument2.all collection looking for instances of the IHTMLInputElement interface whose type property is "image" , eg: 如果你需要找到在图像<input type="image">标记之间,你将不得不枚举的元素IHTMLDocument2.all集合寻找实例IHTMLInputElement其接口type属性是"image" ,如:

var
  Document: IHTMLDocument2;
  Elements: IHTMLElementCollection;
  Element: IHTMLElement;
  Image: IHTMLImgElement;
  Input: IHTMLInputElement;
  I: Integer;
begin
  Document := WebBrowser1.Document as IHTMLDocument2;
  Elements := Document.all;
  For I := 0 to Elements.length - 1 do
  begin
    Element := Elements.item(I, '') as IHTMLElement;
    if Element is IHTMLImgElement then begin
      Image := Element as IHTMLImgElement;
      // use Image as needed...
    end
    else if Element is IHTMLInputElement then begin
      Input := Element as IHTMLInputElement;
      if Input.type = 'image' then
      begin
        // use Input as needed...
      end;
    end;
  end;
end;

Instead of requesting a specific element by id, you can "walk" the document and look at each element using WebDocument.all.item(itemnum,''). 您可以使用WebDocument.all.item(itemnum,'')来“遍历”文档并查看每个元素,而不用通过id请求特定元素。

var
  cAllElements: IHTMLElementCollection;
  eThisElement: IHTMLElement;
  WebDocument: IHTMLDocument2;

======= =======

 cAllElements:=WebDocument.All
  For iThisElement:=0 to cAllElements.num-1 do
    begin
      eThisElement:=cAllElements.item(iThisElement,'') as IHTMLElement;
      // check out eThisElement and do what you want
    end;

You would then look at the element .tagName for IMG, or do whatever assessment you need in order to determine if it is a picture and handle it as you did before. 然后,您将查看IMG的元素.tagName,或进行所需的任何评估以确定它是否是图片,并像以前一样处理它。

Dan

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM