简体   繁体   English

PHP DomDocument,DomXPath编码问题

[英]PHP DomDocument, DomXPath encoding issue

I'm having a problem with encoding from a wordpress feed that I just can't seem to figure out. 我似乎无法弄清来自wordpress提要的编码问题。

I was loading my feed with DOMDocument->load but then did a file_get_contents and am now using ->XMLload with the same results. 我正在使用DOMDocument-> load加载我的提要,但后来做了一个file_get_contents,现在使用-> XMLload具有相同的结果。 I did the XMLload so I could manipulate the feed if needed. 我做了XMLload,因此可以根据需要操纵提要。

The correct output that I'm looking for is - ' £ . 我要寻找的正确输出是- ' £ If I just echo from a Xpath query, I get - ‘ £ . 如果我只是从Xpath查询中回显,则会得到- ‘ £ If I echo with utf8_decode I get - ? £ 如果我用utf8_decode回显,则得到- ? £ - ? £ . - ? £ A lot better but the question mark should be an apostrophe. 好多了,但问号应该是撇号。

If I loop through each node of the DomDocument when it is loaded, I get the correct output. 如果在加载DomDocument的每个节点时进行遍历,则会得到正确的输出。 So it seems that it's being handled incorrectly in XPath. 因此,似乎XPath中的处理方式不正确。

Any thought? 任何想法?

The feed is http://shredeasy.com/blog/category/news/feed 提要是http://shredeasy.com/blog/category/news/feed

Here is the function that is being called: 这是被调用的函数:

function getPostsInCategory($feed=NULL){
    if(is_null($feed)){ echo "Wrong Usage. Need a valid Category Feed.  Most likely from getCategories()."; return false; }
    $feedx = file_get_contents($feed);
    $xml = new DOMDocument();
    $xml->loadXML($feedx);
    //$this->showDOMNode($xml);


    //$xml->load($feed);
    $xpath = new DomXPath($xml);
    $xpath->registerNamespace("content", "http://web.resource.org/rss/1.0/modules/content/");

    $cat = array();
    foreach($xml->getElementsByTagName('item') as $c){
        $elements = array();
        $elements["title"] = $xpath->query("title", $c)->item(0)->nodeValue;
        echo utf8_decode($elements["title"]);

I have been trying to figure this out for hours and I keep circling back to the wrong thing. 我已经尝试了好几个小时才能弄清楚这个问题,但我总是回想起错误的事情。

Thanks for the help! 谢谢您的帮助!

You know right, it seems to be that apostrophes are turning into question marks....Gosh! 没错,似乎撇号正在变成问号...。天哪! I don't know if that's the only issue or not. 我不知道这是否是唯一的问题。

The string being echoed is encoded in UTF-8. 回显的字符串以UTF-8编码。

  • If your page was encoded in UTF-8, you can just echo it, possibly calling htmlspecialchars with the third argument set to "UTF-8". 如果您的页面是使用UTF-8编码的,则只需回显它,就可以调用htmlspecialchars并将第三个参数设置为“ UTF-8”。
  • Otherwise, you have to convert it before to whatever encoding your webpage is using. 否则,您必须先将其转换为网页所使用的编码。 See iconv and mb_convert_encoding . 请参阅iconvmb_convert_encoding

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM