简体   繁体   English

如何使用 jQuery 解码 HTML 实体?

[英]How to decode HTML entities using jQuery?

How do I use jQuery to decode HTML entities in a string?如何使用 jQuery 解码字符串中的 HTML 实体?

Security note: using this answer (preserved in its original form below) may introduce an XSS vulnerability into your application.安全说明:使用此答案(以其原始形式保存在下面)可能会在您的应用程序中引入XSS 漏洞 You should not use this answer.你不应该使用这个答案。 Read lucascaro's answer for an explanation of the vulnerabilities in this answer, and use the approach from either that answer or Mark Amery's answer instead.阅读lucascaro 的答案以了解此答案中漏洞的解释,并改用该答案或Mark Amery 的答案中的方法。

Actually, try其实试试

 var encodedStr = "This is fun &amp; stuff"; var decoded = $("<div/>").html(encodedStr).text(); console.log(decoded);
 <script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script> <div/>

Without any jQuery:没有任何 jQuery:

 function decodeEntities(encodedString) { var textArea = document.createElement('textarea'); textArea.innerHTML = encodedString; return textArea.value; } console.log(decodeEntities('1 &amp; 2')); // '1 & 2'

This works similarly to the accepted answer , but is safe to use with untrusted user input.这与接受的答案类似,但可以安全地用于不受信任的用户输入。


Security issues in similar approaches类似方法中的安全问题

As noted by Mike Samuel , doing this with a <div> instead of a <textarea> with untrusted user input is an XSS vulnerability, even if the <div> is never added to the DOM:正如Mike Samuel所指出的,使用<div>而不是带有不受信任的用户输入的<textarea>执行此操作是一个 XSS 漏洞,即使<div>从未添加到 DOM:

 function decodeEntities(encodedString) { var div = document.createElement('div'); div.innerHTML = encodedString; return div.textContent; } // Shows an alert decodeEntities('<img src="nonexistent_image" onerror="alert(1337)">')

However, this attack is not possible against a <textarea> because there are no HTML elements that are permitted content of a <textarea> .但是,这种攻击不可能针对<textarea>因为没有 HTML 元素是<textarea>的允许内容。 Consequently, any HTML tags still present in the 'encoded' string will be automatically entity-encoded by the browser.因此,仍然存在于“已编码”字符串中的任何 HTML 标记将由浏览器自动进行实体编码。

 function decodeEntities(encodedString) { var textArea = document.createElement('textarea'); textArea.innerHTML = encodedString; return textArea.value; } // Safe, and returns the correct answer console.log(decodeEntities('<img src="nonexistent_image" onerror="alert(1337)">'))

Warning : Doing this using jQuery's .html() and .val() methods instead of using .innerHTML and .value is also insecure* for some versions of jQuery, even when using a textarea .警告:使用 jQuery 的.html().val()方法而不是使用.innerHTML.value执行此操作对于某些版本的 jQuery 也是不安全的*,即使在使用textarea时也是如此 This is because older versions of jQuery would deliberately and explicitly evaluate scripts contained in the string passed to .html() .这是因为旧版本的 jQuery 会故意并明确地评估传递给.html()的字符串中包含的脚本。 Hence code like this shows an alert in jQuery 1.8:因此像这样的代码在 jQuery 1.8 中显示了一个警告:

 //<.-- CDATA // Shows alert $("<textarea>");html("<script>alert(1337).</script>");text(); //-->
 <script src="https://ajax.googleapis.com/ajax/libs/jquery/1.2.3/jquery.min.js"></script>

* Thanks to Eru Penkman for catching this vulnerability. * 感谢Eru Penkman 发现了这个漏洞。

Like Mike Samuel said, don't use jQuery.html().text() to decode html entities as it's unsafe.正如 Mike Samuel 所说,不要使用 jQuery.html().text() 来解码 html 实体,因为它不安全。

Instead, use a template renderer like Mustache.js or decodeEntities from @VyvIT's comment.相反,使用模板渲染器,如Mustache.js@VyvIT评论中的 decodeEntities。

Underscore.js utility-belt library comes with escape and unescape methods, but they are not safe for user input: Underscore.js utility-belt 库带有escapeunescape方法,但它们对用户输入不安全:

_.escape(string) _.escape(字符串)

_.unescape(string) _.unescape(字符串)

I think you're confusing the text and HTML methods.我认为您混淆了文本和 HTML 方法。 Look at this example, if you use an element's inner HTML as text, you'll get decoded HTML tags (second button).看看这个例子,如果你使用一个元素的内部 HTML 作为文本,你会得到解码的 HTML 标签(第二个按钮)。 But if you use them as HTML, you'll get the HTML formatted view (first button).但是如果您将它们用作 HTML,您将获得 HTML 格式的视图(第一个按钮)。

<div id="myDiv">
    here is a <b>HTML</b> content.
</div>
<br />
<input value="Write as HTML" type="button" onclick="javascript:$('#resultDiv').html($('#myDiv').html());" />
&nbsp;&nbsp;
<input value="Write as Text" type="button" onclick="javascript:$('#resultDiv').text($('#myDiv').html());" />
<br /><br />
<div id="resultDiv">
    Results here !
</div>

First button writes: here is a HTML content.第一个按钮写道:这是一个HTML内容。

Second button writes: here is a <B>HTML</B> content.第二个按钮写道:这是一个<B>HTML</B> 内容。

By the way, you can see a plug-in that I found in jQuery plugin - HTML decode and encode that encodes and decodes HTML strings.对了,大家可以看到我在jQuery插件中找到的一个插件——HTML decode and encode ,可以对HTML字符串进行编码和解码。

The question is limited by 'with jQuery' but it might help some to know that the jQuery code given in the best answer here does the following underneath...this works with or without jQuery:这个问题受到“with jQuery”的限制,但它可能会帮助一些人知道此处最佳答案中给出的 jQuery 代码在下面执行以下操作......这适用于或不使用 jQuery:

function decodeEntities(input) {
  var y = document.createElement('textarea');
  y.innerHTML = input;
  return y.value;
}

You can use the he library, available from https://github.com/mathiasbynens/he您可以使用he库,可从https://github.com/mathiasbynens/he获得

Example:例子:

console.log(he.decode("J&#246;rg &amp J&#xFC;rgen rocked to &amp; fro "));
// Logs "Jörg & Jürgen rocked to & fro"

I challenged the library's author on the question of whether there was any reason to use this library in clientside code in favour of the <textarea> hack provided in other answers here and elsewhere.我就是否有任何理由在客户端代码中使用此库以支持此处和其他地方的其他答案中提供的<textarea> hack 的问题向库的作者提出质疑。 He provided a few possible justifications:他提供了一些可能的理由:

  • If you're using node.js serverside, using a library for HTML encoding/decoding gives you a single solution that works both clientside and serverside.如果您使用的是 node.js 服务器端,则使用 HTML 编码/解码库可为您提供同时适用于客户端和服务器端的单一解决方案。

  • Some browsers' entity decoding algorithms have bugs or are missing support for some named character references .某些浏览器的实体解码算法存在错误或缺少对某些命名字符引用的支持。 For example, Internet Explorer will both decode and render non-breaking spaces ( &nbsp; ) correctly but report them as ordinary spaces instead of non-breaking ones via a DOM element's innerText property, breaking the <textarea> hack (albeit only in a minor way).例如,Internet Explorer 将正确解码和呈现不间断空格 ( &nbsp; ),但通过 DOM 元素的innerText属性将它们报告为普通空格而不是不间断空格,从而破坏了<textarea> hack(尽管只是在次要的情况下)方法)。 Additionally, IE 8 and 9 simply don't support any of the new named character references added in HTML 5. The author of he also hosts a test of named character reference support athttp://mathias.html5.org/tests/html/named-character-references/ .此外,IE 8 和 9 根本不支持HTML 5 中添加的任何新命名字符引用。还在http://mathias.html5.org/tests/html上主持了命名字符引用支持测试/命名字符参考/ In IE 8, it reports over one thousand errors.在 IE 8 中,它报告了一千多个错误。

    If you want to be insulated from browser bugs related to entity decoding and/or be able to handle the full range of named character references, you can't get away with the <textarea> hack;如果您想避免与实体解码相关的浏览器错误和/或能够处理所有命名字符引用,您就无法摆脱<textarea> hack; you'll need a library like he .你需要一个像这样的图书馆。

  • He just darn well feels like doing things this way is less hacky.他真该死,觉得用这种方式做事不那么古怪。

encode:编码:

 $("<textarea/>").html('<a>').html(); // return '&lt;a&gt'
 <script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script> <textarea/>

decode:解码:

 $("<textarea/>").html('&lt;a&gt').val() // return '<a>'
 <script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script> <textarea/>

Try this:试试这个:

 var htmlEntities = "&lt;script&gt;alert('hello');&lt;/script&gt;"; var htmlDecode =$.parseHTML(htmlEntities)[0]['wholeText']; console.log(htmlDecode);
 <script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>

parseHTML is a Function in Jquery library and it will return an array that includes some details about the given String.. parseHTML是 Jquery 库中的一个函数,它将返回一个数组,其中包含有关给定字符串的一些详细信息。

in some cases the String is being big, so the function will separate the content to many indexes..在某些情况下,字符串很大,因此该函数会将内容分离到许多索引中。

and to get all the indexes data you should go to any index, then access to the index called "wholeText".要获取所有索引数据,您应该访问任何索引,然后访问名为“wholeText”的索引。

I chose index 0 because it's will work in all cases (small String or big string).我选择索引 0 是因为它适用于所有情况(小字符串或大字符串)。

Use采用

myString = myString.replace( /\&amp;/g, '&' );

It is easiest to do it on the server side because apparently JavaScript has no native library for handling entities, nor did I find any near the top of search results for the various frameworks that extend JavaScript.在服务器端执行此操作最简单,因为显然 JavaScript 没有用于处理实体的本机库,而且我也没有在扩展 JavaScript 的各种框架的搜索结果顶部附近找到任何内容。

Search for "JavaScript HTML entities", and you might find a few libraries for just that purpose, but they'll probably all be built around the above logic - replace, entity by entity.搜索“JavaScript HTML 实体”,您可能会找到一些用于此目的的库,但它们可能都是围绕上述逻辑构建的——逐个实体替换。

You have to make custom function for html entities:您必须为 html 实体创建自定义函数:

function htmlEntities(str) {
return String(str).replace(/&/g, '&amp;').replace(/</g, '&lt;').replace(/>/g,'&gt;').replace(/"/g, '&quot;');
}

Suppose you have below String.假设您有以下字符串。

Our Deluxe cabins are warm, cozy &amp;我们的豪华客舱温暖、舒适且舒适。 comfortable自在

var str = $("p").text(); // get the text from <p> tag
$('p').html(str).text();  // Now,decode html entities in your variable i.e 

str and assign back to str 并分配回

tag.标签。

that's it.而已。

For ExtJS users, if you already have the encoded string, for example when the returned value of a library function is the innerHTML content, consider this ExtJS function:对于 ExtJS 用户,如果你已经有了编码的字符串,例如当库函数的返回值是 innerHTML 内容时,考虑这个 ExtJS 函数:

Ext.util.Format.htmlDecode(innerHtmlContent)

I just had to have an HTML entity charater (⇓) as a value for a HTML button.我只需要有一个 HTML 实体字符 (⇓) 作为 HTML 按钮的值。 The HTML code looks good from the beginning in the browser: HTML 代码在浏览器中从一开始看起来就不错:

<input type="button" value="Embed & Share  &dArr;" id="share_button" />

Now I was adding a toggle that should also display the charater.现在我正在添加一个也应该显示字符的切换。 This is my solution这是我的解决方案

$("#share_button").toggle(
    function(){
        $("#share").slideDown();
        $(this).attr("value", "Embed & Share " + $("<div>").html("&uArr;").text());
    }

This displays ⇓ again in the button.这会在按钮中再次显示 ⇓。 I hope this might help someone.我希望这可以帮助某人。

Extend a String class:扩展一个字符串类:

String::decode = ->
  $('<textarea />').html(this).text()

and use as method:并用作方法:

"&lt;img src='myimage.jpg'&gt;".decode()

You don't need jQuery to solve this problem, as it creates a bit of overhead and dependency.您不需要 jQuery 来解决这个问题,因为它会产生一些开销和依赖性。

I know there are a lot of good answers here, but since I have implemented a bit different approach, I thought to share.我知道这里有很多好的答案,但由于我实施了一些不同的方法,所以我想分享一下。

This code is a perfectly safe security-wise approach, as the escaping handler depends on the browser, instead on the function.这段代码是一种非常安全的安全方法,因为转义处理程序取决于浏览器,而不是函数。 So, if some vulnerability will be discovered in the future, this solution is covered.因此,如果将来发现某些漏洞,则涵盖此解决方案。

const decodeHTMLEntities = text => {
    // Create a new element or use one from cache, to save some element creation overhead
    const el = decodeHTMLEntities.__cache_data_element 
             = decodeHTMLEntities.__cache_data_element 
               || document.createElement('div');
    
    const enc = text
        // Prevent any mixup of existing pattern in text
        .replace(/⪪/g, '⪪#')
        // Encode entities in special format. This will prevent native element encoder to replace any amp characters
        .replace(/&([a-z1-8]{2,31}|#x[0-9a-f]+|#\d+);/gi, '⪪$1⪫');

    // Encode any HTML tags in the text to prevent script injection
    el.textContent = enc;

    // Decode entities from special format, back to their original HTML entities format
    el.innerHTML = el.innerHTML
        .replace(/⪪([a-z1-8]{2,31}|#x[0-9a-f]+|#\d+)⪫/gi, '&$1;')
        .replace(/⪪#/g, '⪪');
   
    // Get the decoded HTML entities
    const dec = el.textContent;
    
    // Clear the element content, in order to preserve a bit of memory (in case the text is big)
    el.textContent = '';

    return dec;
}

// Example
console.log(decodeHTMLEntities("<script>alert('&awconint;&CounterClockwiseContourIntegral;&#x02233;&#8755;⪪#x02233⪫');</script>"));
// Prints: <script>alert('∳∳∳∳⪪#x02233⪫');</script>

By the way, I have chosen to use the characters and , because they are rarely used, so the chance of impacting the performance by matching them is significantly lower.顺便说一下,我选择使用字符 ,因为它们很少使用,所以通过匹配它们影响性能的可能性要低得多。

Here are still one problem: Escaped string does not look readable when assigned to input value这里仍然存在一个问题:转义字符串在分配给输入值时看起来不可读

var string = _.escape("<img src=fake onerror=alert('boo!')>");
$('input').val(string);

Exapmle: https://jsfiddle.net/kjpdwmqa/3/例如: https ://jsfiddle.net/kjpdwmqa/3/

Alternatively, theres also a library for it..或者,还有一个图书馆......

here, https://cdnjs.com/libraries/he在这里, https://cdnjs.com/libraries/he

npm install he                 //using node.js

<script src="js/he.js"></script>  //or from your javascript directory

The usage is as follows...使用方法如下...

//to encode text 
he.encode('© Ande & Nonso® Company LImited 2018');  

//to decode the 
he.decode('&copy; Ande &amp; Nonso&reg; Company Limited 2018');

cheers.干杯。

To decode HTML Entities with jQuery, just use this function:要使用 jQuery 解码 HTML 实体,只需使用此函数:

function html_entity_decode(txt){
    var randomID = Math.floor((Math.random()*100000)+1);
    $('body').append('<div id="random'+randomID+'"></div>');
    $('#random'+randomID).html(txt);
    var entity_decoded = $('#random'+randomID).html();
    $('#random'+randomID).remove();
    return entity_decoded;
}

How to use:如何使用:

Javascript: Java脚本:

var txtEncoded = "&aacute; &eacute; &iacute; &oacute; &uacute;";
$('#some-id').val(html_entity_decode(txtEncoded));

HTML: HTML:

<input id="some-id" type="text" />

The easiest way is to set a class selector to your elements an then use following code:最简单的方法是为您的元素设置一个类选择器,然后使用以下代码:

$(function(){
    $('.classSelector').each(function(a, b){
        $(b).html($(b).text());
    });
});

Nothing any more needed!什么都不需要了!

I had this problem and found this clear solution and it works fine.我遇到了这个问题并找到了这个明确的解决方案并且它工作正常。

I think that is the exact opposite of the solution chosen.我认为这与所选择的解决方案完全相反。

var decoded = $("<div/>").text(encodedStr).html();

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM