简体   繁体   中英

curl file_get_contents/get_meta_tags encoding

so I'm using CURL to replace the file_get_contents and get_meta_tags functionality in PHP:

<?php

class CURL{


    public static function file_get_contents($url){

        $ch = curl_init();

        curl_setopt($ch, CURLOPT_HEADER, 0);
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
        curl_setopt($ch, CURLOPT_URL, $url);
        curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);

        $data = curl_exec($ch);
        curl_close($ch);

        iconv("Windows-1252","UTF-8",$text);

        return $data;


    }


    public static function get_meta_tags($url){

        $html = self::file_get_contents($url);
        self::get_meta_tags_html($html);



    }

    public static function get_meta_tags_html($html){

        //parsing begins here:
        $doc = new DOMDocument();
        @$doc->loadHTML($html);
        //$nodes = $doc->getElementsByTagName('title');

        //get and display what you need:
        //$title = $nodes->item(0)->nodeValue;

        $metas = $doc->getElementsByTagName('meta');

        $return = array();

        for ($i = 0; $i < $metas->length; $i++)
        {
            $meta = $metas->item($i);
            if($meta->getAttribute('name') == 'title')
               $return["title"] = $meta->getAttribute('content');
            if($meta->getAttribute('name') == 'description')
                $return['description'] = $meta->getAttribute('content');
            if($meta->getAttribute('name') == 'keywords')
                $return['keywords'] = $meta->getAttribute('content');
        }

        return $return;

    }


}


?>

but then when I call CURL::get_meta_tags, on a site that has foreign letters in it such as Japanese, it will return weird characters instead of the Japanese letters whereas if I use the built in php get_meta_tags, it will return the correct character...

how should I modify this code such that CURL::get_meta_tags also return foreign characters properly just like the built in php get_meta_tags

It is more likely that you are just trying to display the text with the wrong encoding.

If you set the character set using the header function it should look correct.

header('Content-Type: text/html; charset=utf-8');

You could check what the character-set is in the meta tag you receive if it was set, and use that.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM