删除body标签之外的所有内容

Question

我有一个变量，其中包含来自外部HTML页面的responseText：

textFromFile = myRequest.responseText;

我怎么能删除身体标签之外的一切？ 我可以使用正则表达式删除字符串中的所有HTML标签（ textFromFile ），但在此之前，如果有人可以帮我删除身体标签之外的所有字符（换句话说，只需保留字符串/单词），我将不胜感激在HTML页面的body标签内）。

---- 编辑部分 ----

我正在阅读的HTML文件是：

<html>
<head> title </head>
<body>
<p> Hello World! <br/>
<a href = ”link.html”> Click <b> here </b> </a> <br/>
Goodbye world!
</p>
</body>
</html>

当我申请时：

var doc = new DOMParser().parseFromString(myRequest.responseText, "text/html");
            alert(doc.body.innerHTML);

回应是：

title 

<p> Hello World! <br>
<a href="”link.html”"> Click <b> here </b> </a> <br>
Goodbye world!
</p>

不应该是这种情况，因为'title'在body标签之外。

Answer 1

使用DOM解析器来解析HTML：

var doc = new DOMParser().parseFromString(myRequest.responseText, "text/html");

然后简单地使用innerHTML （或outerHTML ）：

doc.body.innerHTML;

 var string = "<!DOCTYPE html><title>Title</title><p>Hello</p>", doc = new DOMParser().parseFromString(string, "text/html"); document.getElementById('inner').textContent = doc.body.innerHTML; document.getElementById('outer').textContent = doc.body.outerHTML;

 pre { background: #ddd; font-family: monospace; padding: .5em; }

 The inner HTML of &lt;body&gt; is: <pre id="inner"></pre> The outer HTML of &lt;body&gt; is: <pre id="outer"></pre>

Answer 2

为什么不使用字符串替换函数和一些RegExp（）：

尝试这个：

var responseText = "<html>\
<head> title </head>\
<body>\
<p> Hello World! <br/>\
<a href = ”link.html”> Click <b> here </b> </a> <br/>\
Goodbye world!\
</p>\
</body>\
</html>";

console.log(responseText.replace(new RegExp(".*(<body>)(.*)(<\/body>).*", 'gm'), "$1$2$3"));

OUTPUT：

<body> Hello World! <a href = ”link.html”> Click here </a> Goodbye world!</body>

如果您不想包含<body>和</ body>标记，请从上面删除$ 1和$ 3

删除body标签之外的所有内容

问题描述

2 个解决方案

解决方案1
2 已采纳 2016-02-08 04:13:55

解决方案2
1 2016-02-08 04:52:06

删除body标签之外的所有内容

问题描述

2 个解决方案

解决方案1 2 已采纳 2016-02-08 04:13:55

解决方案2 1 2016-02-08 04:52:06

解决方案1
2 已采纳 2016-02-08 04:13:55

解决方案2
1 2016-02-08 04:52:06