简体   繁体   English

PHP DOM - 访问新添加的节点

[英]PHP DOM - accessing newly added nodes

I use the following to get a html document into DOM: 我使用以下内容将一个HTML文档导入DOM:

$dom = new domDocument('1.0', 'utf-8');
$dom->loadHTML($html)

and then I add some new content to an element in the html: 然后我在html中的元素中添加一些新内容:

$element = $dom->getElementById('mybox');
$f = $dom->createDocumentFragment();
$f->appendXML('<div id="newbox">foo</div>');
$element->appendChild($f);

But if I now want to manipulate the #newbox, I can't do it because I can't access it with getElementById() . 但是,如果我现在想操纵#newbox,我不能这样做,因为我无法使用getElementById()访问它。 In order to do that I have to do the following (reloading with the new html): 为了做到这一点,我必须做以下(使用新的html重新加载):

$html = $dom->saveHTML();
$dom->loadHTML($html)

Which works fine, but when having to do this between every dom manipulation, it becomes expensive performance-wise. 哪个工作正常,但是当必须在每个dom操作之间执行此操作时,它在性能方面变得昂贵。

Is there any better way to "refresh" the DOM so that it works with the newly added elements? 有没有更好的方法来“刷新”DOM,以便它可以与新添加的元素一起使用?

Thanks in advance! 提前致谢! :) :)

On the save-and-load approach, you could also try Document.normalizeDocument . 在保存并加载方法上,您还可以尝试Document.normalizeDocument This should fix up the document as if it had been save-cycled, without actually really serialising. 这应该将文档修复为好像已经保存循环,而实际上没有真正的序列化。 One thing that should do would be to re-calculate the isID -ness of attributes from the document type, which you'd hope would have been set to one of the HTML doctypes (that define id as being an attribute of value type ID) by loadHTML . 应该做的一件事是从文档类型重新计算属性的isID ,您希望将其设置为HTML文档类型之一(将id定义为值类型ID的属性)通过loadHTML

(There is also Element.setIdAttribute which can be used to declare one instance of an Attr to contain an ID, but that's no use to you since you'd have to get hold of it first.) (还有Element.setIdAttribute可以用来声明一个Attr一个实例来包含一个ID,但这对你没用,因为你必须先得到它。)

I haven't tested this though and it wouldn't surprise me if PHP didn't implement this DOM Level 3 Core stuff properly. 我没有测试过这个,如果PHP没有正确实现这个DOM Level 3 Core的东西,我也不会感到惊讶。 By my interpretation of the spec for isId , I reckon it should have picked up the id type definition already automatically. 通过我isId规范的isId ,我认为它应该已经自动获取了id类型定义。 (My own DOM implementation certainly does.) But in that case your code would have worked. (我自己的DOM实现肯定会。)但是在这种情况下你的代码会起作用。 And I suppose appendXML is a non-standard method after all, so there's nothing to say it has to resolve type definitions like loadXML or loadHTML would. 我想appendXML毕竟是一个非标准的方法,所以有什么可说的它必须解决像类型定义loadXMLloadHTML会。

So, maybe a workaround is a better plan. 所以,也许解决方法是更好的计划。 You might use a DOMXPath to select the element by @id attribute rather than real IDness as such. 您可以使用DOMXPath@id属性选择元素,而不是实际ID。 Of course this will be much slower than getElementById , but hopefully faster than normalizeDocument . 当然这比getElementById慢得多,但希望比normalizeDocument更快。

Or just lose the XML string-slinging and stick to the DOM methods, if you can; 或者只是丢失XML字符串,并坚持使用DOM方法,如果可以的话; then it's trivial to keep a reference to a created element. 那么保持对创建元素的引用是微不足道的。 (You can use helper functions to create the elements a bit more quickly if you find the DOM methods too wordy for the amount of content you're creating.) (如果您发现DOM方法对于您正在创建的内容量太过冗长,则可以使用辅助函数来更快地创建元素。)

The only thing I know of that can handle that very well.. beautifuly is python's beautiful soup. 我所知道的唯一可以解决的问题就是......美妙的是蟒蛇的美丽汤。 The DOM is all split up into a parse tree which you can add to or take away at will mabey you can write a python script to handle the html and then coordinate the scripts by database or system call. DOM全部拆分成一个解析树,您可以随意添加或删除它,您可以编写一个python脚本来处理html,然后通过数据库或系统调用来协调脚本。 alternatively server side javascript might be worth investigating. 或者服务器端javascript可能值得调查。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM