简体   繁体   English

卷曲file_get_contents / get_meta_tags编码

[英]curl file_get_contents/get_meta_tags encoding

so I'm using CURL to replace the file_get_contents and get_meta_tags functionality in PHP: 所以我正在使用CURL替换PHP中的file_get_contents和get_meta_tags功能:

<?php

class CURL{


    public static function file_get_contents($url){

        $ch = curl_init();

        curl_setopt($ch, CURLOPT_HEADER, 0);
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
        curl_setopt($ch, CURLOPT_URL, $url);
        curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);

        $data = curl_exec($ch);
        curl_close($ch);

        iconv("Windows-1252","UTF-8",$text);

        return $data;


    }


    public static function get_meta_tags($url){

        $html = self::file_get_contents($url);
        self::get_meta_tags_html($html);



    }

    public static function get_meta_tags_html($html){

        //parsing begins here:
        $doc = new DOMDocument();
        @$doc->loadHTML($html);
        //$nodes = $doc->getElementsByTagName('title');

        //get and display what you need:
        //$title = $nodes->item(0)->nodeValue;

        $metas = $doc->getElementsByTagName('meta');

        $return = array();

        for ($i = 0; $i < $metas->length; $i++)
        {
            $meta = $metas->item($i);
            if($meta->getAttribute('name') == 'title')
               $return["title"] = $meta->getAttribute('content');
            if($meta->getAttribute('name') == 'description')
                $return['description'] = $meta->getAttribute('content');
            if($meta->getAttribute('name') == 'keywords')
                $return['keywords'] = $meta->getAttribute('content');
        }

        return $return;

    }


}


?>

but then when I call CURL::get_meta_tags, on a site that has foreign letters in it such as Japanese, it will return weird characters instead of the Japanese letters whereas if I use the built in php get_meta_tags, it will return the correct character... 但是,当我在其中包含外国字母(例如日语)的网站上调用CURL :: get_meta_tags时,它将返回奇怪的字符而不是日语字母,而如果我使用内置的php get_meta_tags,它将返回正确的字符。 ..

how should I modify this code such that CURL::get_meta_tags also return foreign characters properly just like the built in php get_meta_tags 我应该如何修改此代码,以使CURL :: get_meta_tags也正确返回外来字符,就像内置的php get_meta_tags

It is more likely that you are just trying to display the text with the wrong encoding. 您更有可能只是尝试以错误的编码显示文本。

If you set the character set using the header function it should look correct. 如果使用标题功能设置字符集,它应该看起来正确。

header('Content-Type: text/html; charset=utf-8');

You could check what the character-set is in the meta tag you receive if it was set, and use that. 您可以检查所接收的meta标记中的字符集是否已设置,然后使用它。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM