当输入中允许html实体时，如何防止html实体的双重编码

Question

How can I prevent double encoding of html entities, or fix them programmatically? 如何防止html实体的双重编码，或以编程方式修复它们？

I am using the encode() function from the HTML::Entities perl module to encode HTML entities in user input. 我正在使用HTML :: Entities perl模块中的encode（）函数来编码用户输入中的HTML实体。 The problem here is that we also allow users to input HTML entities directly and these entities end up being double encoded. 这里的问题是我们还允许用户直接输入HTML实体，这些实体最终被双重编码。

For example, a user may enter: 例如，用户可以输入：

Stackoverflow & Perl = Awesome…

This ends up being encoded to 这最终被编码为

Stackoverflow & Perl = Awesome&hellip;

This renders in the browser as 这在浏览器中呈现为

Stackoverflow & Perl = Awesome…

We want this to render as 我们希望将其渲染为

Stackoverflow & Perl = Awesome...

Is there a way to prevent this double encoding? 有没有办法防止这种双重编码？ Or is there a module or snippet of code that can easily correct these double encoding issues? 或者是否有一个模块或代码片段可以轻松纠正这些双重编码问题？

Any help is greatly appreciated! 任何帮助是极大的赞赏！

Answer 1

You can decode the string first: 您可以先解码字符串：

my $input = from_user();

my $encoded = encode_entities( decode_entities $input );

Answer 2

There is an extremely simple way to avoid this: 有一种非常简单的方法可以避免这种情况：

Remove all the entities upon input (turn them into Unicode) 输入后删除所有实体（将它们转换为Unicode）
Encode into entities again at the stage of output. 在输出阶段再次对实体进行编码。

Answer 3

Consider saving the call to encode() until you retrieve the value for display, rather than before you store it. 考虑将调用保存到encode()直到您检索显示的值，而不是在存储它之前。 So long as you are consistent in your retrieval mechanism, the extra data in your database probably isn't worth fretting over. 只要您的检索机制一致，数据库中的额外数据可能就不值得烦恼了。

Edit 编辑

Re-reading your question I realize now my answer doesn't fully address the issue seeing as calling encode() later will still have the same results. 重新阅读你的问题我现在意识到我的答案没有完全解决这个问题，因为稍后调用encode()会产生相同的结果。 Not knowing of an alternative myself, it may not be much help, but you may want to consider finding a more suitable method for encoding that will respect existing symbols. 我自己不知道替代方案，它可能没有多大帮助，但您可能想要考虑找到一种更适合编码的方法来尊重现有符号。

当输入中允许html实体时，如何防止html实体的双重编码

问题描述

3 个解决方案

解决方案1
6 已采纳 2010-04-09 01:48:52

解决方案2
4

解决方案3
1 2010-04-09 01:38:17

当输入中允许html实体时，如何防止html实体的双重编码

问题描述

3 个解决方案

解决方案1 6 已采纳 2010-04-09 01:48:52

解决方案2 4

解决方案3 1 2010-04-09 01:38:17

解决方案1
6 已采纳 2010-04-09 01:48:52

解决方案2
4

解决方案3
1 2010-04-09 01:38:17