从来自服务器的字符串中删除HTML标签和实体

Question

In an app I receive some HTML text: since the app can't display (interpret) HTML, I need to remove any HTML tag and entity from the string I receive from the server. 在一个应用程序中，我收到一些HTML文本：由于该应用程序无法显示（解释）HTML，因此我需要从从服务器接收的字符串中删除任何HTML标记和实体。

I tried the following, but this one removes HTML tags but not entities (eg. &bnsp;): 我尝试了以下操作，但是此操作删除了HTML标签，但未删除实体（例如＆bnsp;）：

stringFromServer.replace(/(<([^>]+)>)/ig,"");

Any help is appreciated. 任何帮助表示赞赏。

Disclaimer: I need a pure JavaScript solution (no JQuery, Underscore, etc.). 免责声明：我需要一个纯JavaScript解决方案（没有JQuery，Underscore等）。

[UPDATE] I'm reading all your answers now and I forgot to mention that I'm using JavaScript BUT the environment is not a web page, so I have no DOM . [更新] 我现在正在阅读所有答案，而我忘了提及我使用的是JavaScript，但环境不是网页，因此我没有DOM 。

Answer 1

You can try something like this: 您可以尝试如下操作：

var placeholder = document.createElement('div');
placeholder.innerHTML = stringFromServer;

var theText = placeholder.innerText;

.innerText only grabs text content from the element. .innerText仅从元素中获取文本内容。

However, since it appears you don't have access to any DOM manipulation at all, you're probably going to have to use some kind of HTML parser, like these: 但是，由于看上去您根本无法访问任何DOM操作，因此您可能将不得不使用某种HTML解析器，例如：
https://www.npmjs.org/package/htmlparser https://www.npmjs.org/package/htmlparser
http://ejohn.org/blog/pure-javascript-html-parser/ http://ejohn.org/blog/pure-javascript-html-parser/

Answer 2

A solution without using regexes or phantom divs can be found on Mozilla's MDN . 在Mozilla的MDN上可以找到不使用正则表达式或幻影div的解决方案。

I put the code in a JSfiddle here : 我将代码放在这里的JSfiddle中：

var sMyString = "<a id=\"a\"><b id=\"b\">hey!<\/b><\/a>";
var oParser = new DOMParser();
var oDOM = oParser.parseFromString(sMyString, "text/xml");
// print the name of the root element or error message
alert(oDOM.documentElement.nodeName == "parsererror" ?
       "error while parsing" : oDOM.documentElement.textContent);

Answer 3

Alternatively, parse the HTML snippet in a new document and do your dom manipulations from that (if you'd rather keep it separate from the current document): 或者，解析新文档中的HTML代码段，然后从中进行dom操作（如果您希望将其与当前文档分开）：

var tmpDoc=document.implementation.createHTMLDocument("");
tmpDoc.body.innerHTML="<a href='#'>some text</a><p style=''> more text</p>";
tmpDoc.body.textContent;

tmpDoc.body.textContent evaluates to: tmpDoc.body.textContent计算为：

some text more text

Answer 4

stringFromServer.replace(/(<([^>]+)>|&[^;]+;)/ig, "")

从来自服务器的字符串中删除HTML标签和实体

问题描述

4 个解决方案

解决方案1
2 已采纳 2014-11-04 08:14:48

解决方案2
0 2014-11-04 08:26:51

解决方案3
0 2014-11-04 08:29:15

解决方案4
-1 2014-11-04 08:24:43

从来自服务器的字符串中删除HTML标签和实体

问题描述

4 个解决方案

解决方案1 2 已采纳 2014-11-04 08:14:48

解决方案2 0 2014-11-04 08:26:51

解决方案3 0 2014-11-04 08:29:15

解决方案4 -1 2014-11-04 08:24:43

解决方案1
2 已采纳 2014-11-04 08:14:48

解决方案2
0 2014-11-04 08:26:51

解决方案3
0 2014-11-04 08:29:15

解决方案4
-1 2014-11-04 08:24:43