简体   繁体   English

PHP的DomElement-> nodeValue有gobbly-gook

[英]PHP's DomElement->nodeValue has gobbly-gook

I'm parsing a third-party web page using PHP's DOMElement controls. 我正在使用PHP的DOMElement控件解析第三方网页。 When I use the web page with my browser and view the source, it's clean, but when I access some of the nodes through the DOMElement->nodeValue parameter the HTML tags aren't there, and there are several newlines and this character Â. 当我在浏览器中使用网页并查看源代码时,它很干净,但是当我通过DOMElement-> nodeValue参数访问某些节点时,HTML标记不存在,并且有几个换行符和这个字符。 According to this answer , this is the character that shows up when there's an encoding issue. 根据这个答案 ,这是出现编码问题时出现的字符。

I also get that gobbly-gook using: 我也得到了那个gobbly-gook:

  • simplexml_import_dom($node)->asXML(); simplexml_import_dom($节点) - > asXML();
  • $doc->saveXML($node); $ doc-> saveXML($节点);

My question is how I can simply get the clean HTML code inside the DOMElement? 我的问题是如何在DOMElement中简单地获取干净的HTML代码?

Here is the clean HTML code: 这是干净的HTML代码:

<b>Author:</b> AUTHOR<br>
            <b>ISBN:</b> 9780684857220 <br>
            <b>Edition/Copyright:</b> 7<br>
            <b>Publisher:</b> J+M<br>
            <b>Published Date:</b>  1989<br>

Here is what nodeValue gives: 这是nodeValue给出的内容:

                    Â 
                    Author:Â AUTHOR      ISBN:Â 9780684857220 Edition/Copyright:Â 7     Publisher:Â J+M       Published Date:Â 
                    1989

Have you tried specifying the encoding when you create the DOM document? 您是否尝试在创建DOM文档时指定编码? For example: 例如:

$doc = new DOMDocument('1.0', 'utf-8');
$doc->loadXML($third_party_web_page_string);

or 要么

$doc = new DOMDocument('1.0', 'iso-8859-1');
$doc->loadXML($third_party_web_page_string);

If neither of those work, you could try using the iconv function over the data before you load it into the DOM object. 如果这些都不起作用,您可以尝试在数据上使用iconv函数,然后再将其加载到DOM对象中。

Turns out it wasn't an encoding issue but rather I was using the wrong methods. 原来这不是一个编码问题,而是我使用了错误的方法。 This works: 这有效:

$doc = new DOMDocument();
$doc->appendChild($doc->importNode($second_td,true)); 
echo $doc->saveHTML();

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM