RSS Feed中的utf-8和htmlentities

Question

I'm writing some RSS feeds in PHP and stuggling with character-encoding issues. 我正在用PHP编写一些RSS提要并且正在处理字符编码问题。 Should I utf8_encode() before or after htmlentities() encoding? 我应该在htmlentities（）编码之前或之后使用utf8_encode（）吗？ For example, I've got both ampersands and Chinese characters in a description element, and I'm not sure which of these is proper: 例如，我在描述元素中有两个＆符号和中文字符，我不确定哪个是正确的：

$output = utf8_encode(htmlentities($source)); or
$output = htmlentities(utf8_encode($source));

And why? 为什么？

Answer 1

It's important to pass the character set to the htmlentities function, as the default is ISO-8859-1: 将字符集传递给htmlentities函数非常重要，因为默认值为ISO-8859-1：

utf8_encode(htmlentities($source,ENT_COMPAT,'utf-8'));

You should apply htmlentities first as to allow utf8_encode to encode the entities properly. 您应首先应用htmlentities以允许utf8_encode正确编码实体。

(EDIT: I changed from my opinion before that the order didn't matter based on the comments. This code is tested and works well). （编辑：我之前的意见改变了，根据评论，订单无关紧要。此代码经过测试并且运行良好）。

Answer 2

First: The utf8_encode function converts from ISO 8859-1 to UTF-8. 第一： utf8_encode函数从ISO 8859-1转换为UTF-8。 So you only need this function, if your input encoding/charset is ISO 8859-1. 因此，如果输入编码/字符集是ISO 8859-1，则只需要此功能。 But why don't you use UTF-8 in the first place? 但是你为什么不首先使用UTF-8？

Second: You don't need htmlentities . 第二：你不需要htmlentities 。 You just need htmlspecialchars to replace the special characters by character references. 您只需要htmlspecialchars来替换字符引用的特殊字符。 htmlentities would replace “too much” characters that can be encoded directly using UTF-8. htmlentities将替换可以使用UTF-8直接编码的“太多”字符。 Important is that you use the ENT_QUOTES quote style to replace the single quotes as well. 重要的是，您还使用ENT_QUOTES引号样式替换单引号。

So my proposal: 所以我的提议：

// if your input encoding is ISO 8859-1
htmlspecialchars(utf8_encode($string), ENT_QUOTES)

// if your input encoding is UTF-8
htmlspecialchars($string, ENT_QUOTES, 'UTF-8')

Answer 3

Don't use htmlentities() ! 不要使用htmlentities() ！

Simply use UTF-8 characters. 只需使用UTF-8字符。 Just make sure you declare encoding of the feed in HTTP headers ( Content-Type:application/xml;charset=UTF-8 ) or failing that, in the feed itself using <?xml version="1.0" encoding="UTF-8"?> on the first line. 只需确保在HTTP标头（ Content-Type:application/xml;charset=UTF-8 ）中声明feed的编码或使用<?xml version="1.0" encoding="UTF-8"?>在Feed本身中失败<?xml version="1.0" encoding="UTF-8"?>在第一行。

Answer 4

It might be easier to forget htmlentities and use a CDATA section. 忘记htmlentities并使用CDATA部分可能更容易。 It works for the title section, which doesn't seem support encoded HTML characters in Firefox's RSS viewer: 它适用于标题部分，在Firefox的RSS查看器中似乎不支持编码的HTML字符：

<title><![CDATA[News & Updates  " > » ☂ ☺ ☹ ☃  Test!]]></title>

Answer 5

You want to do $output = htmlentities(utf8_encode($source)); 你想做$output = htmlentities(utf8_encode($source)); . 。 This is because you want to convert your international characters into proper UTF8 first, and then have ampersands (and possibly some of the UTF-8 characters as well) turned in to HTML entities. 这是因为您希望首先将国际字符转换为正确的UTF8，然后将＆符号（可能还有一些UTF-8字符）转换为HTML实体。 If you do the entities first, then some of the international characters may not be handled properly. 如果先执行实体，则可能无法正确处理某些国际字符。

If none of your international characters are going to be changed by utf8_encode, then it doesn't matter which order you call them in. 如果utf8_encode不会更改您的国际字符，那么您调用它们的顺序无关紧要。

Answer 6

After much trial & error, I finally found a way to properly display a string from a utf8-encoded database value, through an xml file, to an html page: 经过多次试验和错误，我终于找到了一种方法，可以正确显示从utf8编码的数据库值到xml文件的字符串到html页面：

$output = '<![CDATA['.utf8_encode(htmlentities($string)).']]>';

I hope this helps someone. 我希望这可以帮助别人。

RSS Feed中的utf-8和htmlentities

问题描述

6 个解决方案

解决方案1
17 已采纳 2008-11-21 02:28:48

解决方案2
14 2009-01-26 09:58:09

解决方案3
7 2008-11-26 21:39:27

解决方案4
2

解决方案5
1 2008-11-21 02:20:25

解决方案6
0 2009-05-23 00:14:24

RSS Feed中的utf-8和htmlentities

问题描述

6 个解决方案

解决方案1 17 已采纳 2008-11-21 02:28:48

解决方案2 14 2009-01-26 09:58:09

解决方案3 7 2008-11-26 21:39:27

解决方案4 2

解决方案5 1 2008-11-21 02:20:25

解决方案6 0 2009-05-23 00:14:24

解决方案1
17 已采纳 2008-11-21 02:28:48

解决方案2
14 2009-01-26 09:58:09

解决方案3
7 2008-11-26 21:39:27

解决方案4
2

解决方案5
1 2008-11-21 02:20:25

解决方案6
0 2009-05-23 00:14:24