简体   繁体   English

如何使用JavaScript使用HTML从字符串中删除整个HTML,HEAD标签和BODY标签?

[英]How to remove whole HTML, HEAD tags and BODY tag from string with HTML using JavaScript?

I have a template file that is called myWebsite.html. 我有一个名为myWebsite.html的模板文件。 It contains everything that HTML template needs to have. 它包含HTML模板所需的所有内容。 So it has HTML, HEAD and BODY tags. 因此它具有HTML,HEAD和BODY标签。 I want to load it with JavaScript and put into one of divs on the site. 我想用JavaScript加载它并放入站点中的div之一。 So i don't want to have the HTML, HEAD and BODY tags. 所以我不想有HTML,HEAD和BODY标签。 How to do this? 这个怎么做?

This is a prototype of what i need to have: 这是我需要具备的原型:

$val = getData('myWebsite.html');
$val = removeHTMLHEADBODYTAGS($val); //remove these tags with everything insite, also remove the body tag but leave the contents in the body tag. Also remove the end tags of body and html - HOW TO DO THIS?
div.innerHTML = $val;

I want to do this in pure JavaScript = NO jQUERY 我想用纯JavaScript = NO jQUERY做到这一点

Why not fetch the information out of the tag and then work with that? 为什么不从标签中取出信息,然后使用它呢? There is no need to fetch all information and the removing html, head and body: 无需获取所有信息,也无需删除html,head和body:

content = $val.getElementsByTagName('body')[0].innerHTML();

You could extract it with a regex. 您可以使用正则表达式将其提取。 Something like: /\\<body[^>]*\\>(.*)\\<\\/body/m - that should return all content within the <BODY> element. 类似于:/ /\\<body[^>]*\\>(.*)\\<\\/body/m <BODY> ^ <BODY> /\\<body[^>]*\\>(.*)\\<\\/body/m <BODY> .*) /\\<body[^>]*\\>(.*)\\<\\/body/m <BODY> /\\<body[^>]*\\>(.*)\\<\\/body/m应该返回<BODY>元素内的所有内容。

$val = getData('myWebsite.html');
var reg = /\<body[^>]*\>([^]*)\<\/body/m;
div.innerHTML = $val.match( reg )[1];

Example jsFiddle code: http://jsfiddle.net/x4hPZ/1/ jsFiddle示例代码: http//jsfiddle.net/x4hPZ/1/

how about: 怎么样:

var bodyContents = htmlstring.split('<body');//no >, body could have a property
bodyContents = bodyContents[1].replace('</body>','').replace('</html>','').replace(/^.*\>/,'');

The last regex replace removes the closing > of the opening body tag, and all possible tag properties. 最后一个正则表达式替换将删除开头body标签的close >和所有可能的标签属性。

This is, however, not the way I would do things... If at all possible, I'd create an (i)Frame node, load the html into that frame, and get the innerHTML from the body tag. 但是,这不是我要做的事情...如果可能的话,我将创建一个(i)Frame节点,将html加载到该框架中,然后从body标记中获取innerHTML。 Just a suggestion. 只是一个建议。

Right, the iFrame way: 正确,iFrame方式:

var document.ifrm = document.createElement('iframe')
document.ifrm.style = 'visibility:hidden';
document.body.appendChild(document.ifrm);
idoc = (document.ifrm.contentDocument ? document.ifrm.contentDocument : document.ifrm.contentWindow.document;)
idoc.open();
idoc.writeln('<html><head><title>foobar</title></head><body><p>Content</p></body></html>');
idoc.close();
var bodyContents = idoc.body.innerHTML;

For code explanation: http://softwareas.com/injecting-html-into-an-iframe 有关代码说明: http : //softwareas.com/injecting-html-into-an-iframe

or any other hit on google.com for that matter :) 或与此相关的google.com上的其他任何点击:)

With jQuery you could do it like this: 使用jQuery,您可以这样做:

$(document).ready(function(){
    var your_content = $("html").clone().find("head,body").remove().end().html();
});
  1. get the content with "html" selector 使用“ html”选择器获取内容
  2. make a copy with clone clone制作副本
  3. find the tags you want to remove find您要删除的标签
  4. remove them and 删除它们并
  5. convert back to HTML 转换回HTML

all in one line. 一站式

HTH, HTH,

--hennson --hennson

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM