简体   繁体   中英

How to remove all attributes from html?

I have raw html with some css classes inside for various tags.

Example:

Input:

<p class="opener" itemprop="description">Lorem ipsum dolor sit amet, consectetur adipisicing elit. Neque molestias natus iste labore a accusamus dolorum vel.</p>

and I would like to get just plain html like:

Output:

<p>Lorem ipsum dolor sit amet, consectetur adipisicing elit. Neque molestias natus iste labore a accusamus dolorum vel.</p>

I do not know names of these classes. I need to do this in JavaScript (node.js).

Any idea?

This can be done with Cheerio, as I noted in the comments.
To remove all attributes on all elements, you'd do:

var html = '<p class="opener" itemprop="description">Lorem ipsum dolor sit amet, consectetur adipisicing elit. Neque molestias natus iste labore a accusamus dolorum vel.</p>';

var $ = cheerio.load(html);   // load the HTML

$('*').each(function() {      // iterate over all elements
    this.attribs = {};     // remove all attributes
});

var html = $.html();          // get the HTML back

I would create a new element, using the tag name and the innerHTML of that element. You can then replace the old element with the new one, or do whatever you like with the newEl as in the code below:

// Get the current element
var el = document.getElementsByTagName('p')[0];

// Create a new element (in this case, a <p> tag)
var newEl = document.createElement(el.nodeName);

// Assign the new element the contents of the old tag
newEl.innerHTML = el.innerHTML;

// Replace the old element with newEl, or do whatever you like with it

perhaps some regex in js could pluck out those css tags and then output the stripped down version? thats if i'm understanding your question corre

也许,只需使用Notepad ++,快速的“查找/替换”操作和空格将是最快的方式,而不是在解析器或类似的东西中思考。

improvise this:

$('.some_div').each(function(){
    class_name = $(this).attr('class');
    $(this).removeClass(class_name)})

In python, do like this but provide a list of files and tags instead of the hard coded ones, then wrap in a for loop:

#!/usr/bin/env python
# encoding: utf-8
import re
f=open('fileWithHtml','r')

for line in f.readlines():
        line = re.sub('<p\s(.*)>[^<]', '<p>', line)
        print(line)

Most probably, this can be easily translated into JavaScript for nodejs

You could dynamically parse the the elements using a DOM (or SAX, depending on what you want to do) parser and remove all the style attributes met.

On JavaScript, you could use HTML DOM removeAttribute() Method.

<script>
  function myFunction()
  {
    document.getElementsByClassName("your div class")[0].removeAttribute("style"); 
};
</script>

I'm providing the client-side (browser) version as this answer came up when I googled remove HTML attributes :

// grab the element you want to modify
var el = document.querySelector('p');

// get its attributes and cast to array, then loop through
Array.prototype.slice.call(el.attributes).forEach(function(attr) {

    // remove each attribute
    el.removeAttribute(attr.name);
});

As a function:

function removeAttributes(el) {

    // get its attributes and cast to array, then loop through
    Array.prototype.slice.call(el.attributes).forEach(function(attr) {

        // remove each attribute
        el.removeAttribute(attr.name);
    });
}
$ = cheerio.load(htmlAsString);

const result = $("*")
 // specify each attribute to remove, "*" as wildcard does not work
.removeAttr("class")
.removeAttr("itemprop")
.html();
// if you also wanted to remove the inner text for some reason, comment out the previous .html() and use
//.text("")
//.toString();

console.log("result", result);

Here is another solution to this problem in vanilla JS:

html.replace(/\s*\S*\="[^"]+"\s*/gm, "");

The script removes all attributes from a string named html using a simple regular expression.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM