[英]Obtaining visible text on a page from an IHTMLDocument2*
I am trying to obtain the text content of a Inte.net Explorer web browser window.我正在尝试获取 Inte.net Explorer web 浏览器 window 的文本内容。
I am following these steps:我正在执行以下步骤:
Edit编辑
My problem is我的问题是
I have tried a recursive approach, but i am clueless as to how to deal with scenarios like this,我尝试了一种递归方法,但我对如何处理这样的场景一无所知,
<div>
Hello World 1
<div style="display: none">Hello world 2</div>
</div>
In this scenario i wont be able to get "Hello World 1"在这种情况下,我将无法获得“Hello World 1”
Can anyone please help me out with the best way to obtain the text from an IHTMLDocument2*.谁能帮我找出从 IHTMLDocument2* 中获取文本的最佳方法。 I am using C++ Win32, no MFC, ATL.
我正在使用 C++ Win32,没有 MFC,ATL。
Thanks, Ashish.谢谢,阿希什。
If you iterate backwards on the document.body.all
elements, you will always walk on the elements inside out.如果您在
document.body.all
元素上向后迭代,您将始终从里到外地遍历这些元素。 So you don't need to walk recursive yourself.所以你不需要自己走递归。 the DOM will do that for you.
DOM 会为你做那件事。 eg (Code is in Delphi):
例如(代码在 Delphi 中):
procedure Test();
var
document, el: OleVariant;
i: Integer;
begin
document := CreateComObject(CLASS_HTMLDocument) as IDispatch;
document.open;
document.write('<div>Hello World 1<div style="display: none">Hello world 2<div>This DIV is also invisible</div></div></div>');
document.close;
for i := document.body.all.length - 1 downto 0 do // iterate backwards
begin
el := document.body.all.item(i);
// filter the elements
if (el.style.display = 'none') then
begin
el.removeNode(true);
end;
end;
ShowMessage(document.body.innerText);
end;
A Side Comment: As for your scenario with the recursive approach:旁注:至于您使用递归方法的场景:
<div>Hello World 1<div style="display: none">Hello world 2</div></div>
If eg our element is the first DIV, el.getAdjacentText('afterBegin')
will return "Hello World 1"
.例如,如果我们的元素是第一个 DIV,
el.getAdjacentText('afterBegin')
将返回"Hello World 1"
。 So we can probably iterate forward on the elements and collect the getAdjacentText('afterBegin')
, but this is a bit more difficult because we need to test the parents of each element for el.currentStyle.display
.所以我们可能可以向前迭代元素并收集
getAdjacentText('afterBegin')
,但这有点困难,因为我们需要为el.currentStyle.display
测试每个元素的父元素。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.