简体   繁体   English

使用Loadhtmlfile时使用PHP编码的特殊字符

[英]Special characters encoding in PHP while using Loadhtmlfile

I am using a PHP file to parse different webpages for title,description and other tags. 我正在使用一个PHP文件来分析不同网页的标题,描述和其他标签。

Here is our code 这是我们的代码

if (isset($_SESSION['user_id']) && !empty($_SESSION['user_id'])) {

    $images = [];
    $url = $_GET['req'];
    $ext = ['.jpeg', 'jpg', 'png', 'bmp', 'ico'];

    $doc = new DOMDocument('1.0','UTF-8');

    $doc->loadHTMLFile($url);
    $doc->encoding = 'UTF-8';

    var_dump($doc);

    $uri = $doc->documentURI;
    $parse = parse_url($uri);
    $host = $parse['host']; //hostname
    $title = $doc->getElementsByTagName('title')->item(0);  // title
    $metas = $doc->getElementsByTagName('meta');
    $details["title"] = $title->textContent;
    $details["host"] = $host;
    $details['uri'] = $uri;
    foreach ($metas as $meta) {

...continues.... ...继续...

Here if our URL document contains any special characters, it is not recognised by PHP. 在这里,如果我们的URL文档包含任何特殊字符,则PHP无法识别它。 It gives us garbled characters. 它给我们带来乱码。 I have gone through different questions on SO and this seems to be UTF-8 encoding problem. 我在SO上经历了不同的问题,这似乎是UTF-8编码问题。 But i am already giving UTF 8 in my code. 但是我已经在代码中使用了UTF 8。 Please help me. 请帮我。

Be aware using the encoding parameter in the constructor. 注意在构造函数中使用encoding参数。 It does not mean that all data is automatically encoded for you in the supplied encoding. 这并不意味着所有数据都会以提供的编码自动为您编码。 You need to do that yourself once you choose an encoding other than the default UTF-8. 一旦选择了默认UTF-8以外的编码,就需要自己做。 See the note on DOM Functions on how to properly work with other encodings... 有关如何正确使用其他编码的信息,请参阅有关DOM函数的注释。

The constructor example clearly shows that version and encoding only end up in the XML header. 构造函数示例清楚地表明,版本和编码仅以XML标头结尾。

Referrer: http://php.net/manual/en/domdocument.construct.php 引荐来源网址http//php.net/manual/en/domdocument.construct.php

IT looks like the constructor doesn't require you to pass it the second argument. IT看起来构造函数不需要您将第二个参数传递给它。 Have you tried running your code without that? 您是否尝试过在没有该代码的情况下运行代码? I admit my understanding of DOMDocument is a little poor but if it's representing an entire HTML document then most web browsers won't throw too much of a hissy fit about missing the encoding information and they'll do their best. 我承认我对DOMDocument的理解有点差,但是如果它代表一个完整的HTML文档,那么大多数网络浏览器对于丢失编码信息不会有太多的犹豫,他们会尽力而为。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM