简体   繁体   中英

Special characters encoding in PHP while using Loadhtmlfile

I am using a PHP file to parse different webpages for title,description and other tags.

Here is our code

if (isset($_SESSION['user_id']) && !empty($_SESSION['user_id'])) {

    $images = [];
    $url = $_GET['req'];
    $ext = ['.jpeg', 'jpg', 'png', 'bmp', 'ico'];

    $doc = new DOMDocument('1.0','UTF-8');

    $doc->loadHTMLFile($url);
    $doc->encoding = 'UTF-8';

    var_dump($doc);

    $uri = $doc->documentURI;
    $parse = parse_url($uri);
    $host = $parse['host']; //hostname
    $title = $doc->getElementsByTagName('title')->item(0);  // title
    $metas = $doc->getElementsByTagName('meta');
    $details["title"] = $title->textContent;
    $details["host"] = $host;
    $details['uri'] = $uri;
    foreach ($metas as $meta) {

...continues....

Here if our URL document contains any special characters, it is not recognised by PHP. It gives us garbled characters. I have gone through different questions on SO and this seems to be UTF-8 encoding problem. But i am already giving UTF 8 in my code. Please help me.

Be aware using the encoding parameter in the constructor. It does not mean that all data is automatically encoded for you in the supplied encoding. You need to do that yourself once you choose an encoding other than the default UTF-8. See the note on DOM Functions on how to properly work with other encodings...

The constructor example clearly shows that version and encoding only end up in the XML header.

Referrer: http://php.net/manual/en/domdocument.construct.php

IT looks like the constructor doesn't require you to pass it the second argument. Have you tried running your code without that? I admit my understanding of DOMDocument is a little poor but if it's representing an entire HTML document then most web browsers won't throw too much of a hissy fit about missing the encoding information and they'll do their best.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM