简体   繁体   中英

jquery/javascript remove HTML tags but no content

I have the following code,

$(document.getElementById('messages_message-wysiwyg-iframe').contentWindow.document).keydown(function() {
        var iFrame =  document.getElementById('messages_message-wysiwyg-iframe');
        var iFrameBody;
        if ( iFrame.contentDocument ) 
        { // FF
            iFrameBody = iFrame.contentDocument.getElementsByTagName('body')[0];
        }
        else if ( iFrame.contentWindow ) 
        { // IE
            iFrameBody = iFrame.contentWindow.document.getElementsByTagName('body')[0];
        }
            console.info(iFrameBody.innerHTML);
    });

What I am trying to do if get the content of an iframe, but remove all the html tags that are not,

b, strong, i, a, u, img

However I do not want to remove any of the of the text, for example if the in the iframe there is the following,

<div class="box segment panel">
    <a href="http://www.google.com>hello world</a> 
    click this link and go far. 
    <img src="http://placehold.it/100x100" alt="Placeholder"/>
 </div>

What would be return would be the following,

<a href="http://www.google.com">hello world</a>  
click this link and go far.
</a>
<img src="http://placehold.it/100x100" alt="Placeholder" />

Is this even possible?

var iFrame = document.getElementById('messages_message-wysiwyg-iframe');
var iFrameDoc = iFrame.contentDocument || iFrame.contentWindow.document;
$(iFrameDoc).keydown(function() {
    var iFrameBody = $("body", iFrameDoc);
    var cleared = iFrameBody.clone();
    cleared.find("*:not(b,strong,i,a,u,img)").each(function() {
        var $this = $(this);
        $this.replaceWith($this.contents());
    });
    console.log(cleared.html());
});

Demo at jsfiddle.net

With a regex:

iFrameBody.innerHTML=iFrameBody.innerHTML.replace(/<[^(b|strong|i|a|u|img)]\b[^>]*>/gi,"").replace(/<\/[^(b|strong|i|a|u|img)]>/gi,"");

The first replace removes the start tags, the second removes the end tags.

Note that there are a couple traps when using regex to match html . But in this specific case it seems like a reasonable choice (cf. my comments on the other answers).

For the record, this is what I use to access an iframe's content document:

var doc=ifr.contentWindow||ifr.contentDocument;
if (doc.document) doc=doc.document;

Here's my pure JS solution:

function sanitize(el) {

    if (el.nodeType !== 1) return;

    if (!/^(B|STRONG|I|A|U|IMG)$/.test(el.tagName)) {
        var p = el.parentNode;

        // move all children out of the element, recursing as we go
        var c = el.firstChild;
        while (c) {
            var d = c.nextSibling;  // remember the next element
            p.insertBefore(c, el);
            sanitize(c);
            c = d;                  // look at the next sibling
        }

        // remove the element
        p.removeChild(el);
    }
}

demo at http://jsfiddle.net/alnitak/WvJAx/

It works by (recursively) moving the child nodes of restricted tags out of their parent, and then removing those tags once they're empty.

I think you're a little confused about how to describe what you're trying to do. When you talk about "text", you're referring to the innerHTML/text node inside of a tag. What you're really looking to do, I think, is grab all of the specific content and the structure of the content, aka the children elements of the iFrame.

You can use jQuery's .text() method to get the text content of each element individually and save that before removing the actual tag from the DOM, if you want to lets say, get the text content of a span but you don't want the span to be in the DOM anymore, or you want to place it somewhere else in your document.

var elemText = $('span#mySpan').text();
$('span#mySpan').remove();

For what it looks like you're trying to do based on your sample HTML, you may want to look into jQuery's detach method: http://api.jquery.com/detach/

This will allow you to store the returned children elements to be appended somewhere else later.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM