[英]Scraping images from Website in delphi with twebbrowser
im trying to make a small tool which downloads all images from the site visited. 我试图制作一个小的工具,从访问的网站下载所有图像。 It have to be made with twebbrowser component. 它必须由twebbrowser组件制成。 The test site from my customer is Click . 我的客户的测试站点是Click 。 At the moment im selecting the pictures with getelementbyid but some of the pictures dont have a id. 目前,我正在选择带有getelementbyid的图片,但其中一些图片没有ID。 How can i adress the missing ones? 我该如何解决失踪者? Thanks alot 非常感谢
After the page is loaded, query the TWebBrowser.Document
property for the IHTMLDocument2
interface, and then you can enumerate the elements of the IHTMLDocument2.images
collection: 页面加载后,查询TWebBrowser.Document
的财产IHTMLDocument2
接口,然后你可以枚举的元素IHTMLDocument2.images
集合:
var
Document: IHTMLDocument2;
Images: IHTMLElementCollection;
Image: IHTMLImgElement;
I: Integer;
begin
Document := WebBrowser1.Document as IHTMLDocument2;
Images := Document.images;
For I := 0 to Images.length - 1 do
begin
Image := Images.item(I, '') as IHTMLImgElement;
// use Image as needed...
end;
end;
Note that this will only find images in HTML <img>
tags. 请注意,这只会在HTML <img>
标记中找到图像。 If you need to find images in <input type="image">
tags as well, you will have to enumerate the elements of the IHTMLDocument2.all
collection looking for instances of the IHTMLInputElement
interface whose type
property is "image"
, eg: 如果你需要找到在图像<input type="image">
标记之间,你将不得不枚举的元素IHTMLDocument2.all
集合寻找实例IHTMLInputElement
其接口type
属性是"image"
,如:
var
Document: IHTMLDocument2;
Elements: IHTMLElementCollection;
Element: IHTMLElement;
Image: IHTMLImgElement;
Input: IHTMLInputElement;
I: Integer;
begin
Document := WebBrowser1.Document as IHTMLDocument2;
Elements := Document.all;
For I := 0 to Elements.length - 1 do
begin
Element := Elements.item(I, '') as IHTMLElement;
if Element is IHTMLImgElement then begin
Image := Element as IHTMLImgElement;
// use Image as needed...
end
else if Element is IHTMLInputElement then begin
Input := Element as IHTMLInputElement;
if Input.type = 'image' then
begin
// use Input as needed...
end;
end;
end;
end;
Instead of requesting a specific element by id, you can "walk" the document and look at each element using WebDocument.all.item(itemnum,''). 您可以使用WebDocument.all.item(itemnum,'')来“遍历”文档并查看每个元素,而不用通过id请求特定元素。
var
cAllElements: IHTMLElementCollection;
eThisElement: IHTMLElement;
WebDocument: IHTMLDocument2;
======= =======
cAllElements:=WebDocument.All
For iThisElement:=0 to cAllElements.num-1 do
begin
eThisElement:=cAllElements.item(iThisElement,'') as IHTMLElement;
// check out eThisElement and do what you want
end;
You would then look at the element .tagName for IMG, or do whatever assessment you need in order to determine if it is a picture and handle it as you did before. 然后,您将查看IMG的元素.tagName,或进行所需的任何评估以确定它是否是图片,并像以前一样处理它。
Dan 担
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.