简体   繁体   English

如何从 html 中删除所有属性?

[英]How to remove all attributes from html?

I have raw html with some css classes inside for various tags.我有原始的 html,里面有一些 css 类用于各种标签。

Example:例子:

Input:输入:

<p class="opener" itemprop="description">Lorem ipsum dolor sit amet, consectetur adipisicing elit. Neque molestias natus iste labore a accusamus dolorum vel.</p>

and I would like to get just plain html like:我想得到简单的 html 就像:

Output: Output:

<p>Lorem ipsum dolor sit amet, consectetur adipisicing elit. Neque molestias natus iste labore a accusamus dolorum vel.</p>

I do not know names of these classes.我不知道这些类的名称。 I need to do this in JavaScript (node.js).我需要在 JavaScript (node.js) 中执行此操作。

Any idea?任何的想法?

This can be done with Cheerio, as I noted in the comments. 正如我在评论中指出的那样,这可以通过Cheerio来完成。
To remove all attributes on all elements, you'd do: 要删除所有元素的所有属性,您需要执行以下操作:

var html = '<p class="opener" itemprop="description">Lorem ipsum dolor sit amet, consectetur adipisicing elit. Neque molestias natus iste labore a accusamus dolorum vel.</p>';

var $ = cheerio.load(html);   // load the HTML

$('*').each(function() {      // iterate over all elements
    this.attribs = {};     // remove all attributes
});

var html = $.html();          // get the HTML back

I would create a new element, using the tag name and the innerHTML of that element. 我将使用标签名称和该元素的innerHTML创建一个新元素。 You can then replace the old element with the new one, or do whatever you like with the newEl as in the code below: 然后,您可以使用新元素替换旧元素,或者使用newEl执行任何您喜欢的newEl如下面的代码所示:

// Get the current element
var el = document.getElementsByTagName('p')[0];

// Create a new element (in this case, a <p> tag)
var newEl = document.createElement(el.nodeName);

// Assign the new element the contents of the old tag
newEl.innerHTML = el.innerHTML;

// Replace the old element with newEl, or do whatever you like with it

perhaps some regex in js could pluck out those css tags and then output the stripped down version? 或许js中的一些正则表达式可以拔出那些css标签,然后输出精简版本? thats if i'm understanding your question corre 多数民众赞成,如果我理解你的问题相关

也许,只需使用Notepad ++,快速的“查找/替换”操作和空格将是最快的方式,而不是在解析器或类似的东西中思考。

improvise this: 即兴表演:

$('.some_div').each(function(){
    class_name = $(this).attr('class');
    $(this).removeClass(class_name)})

In python, do like this but provide a list of files and tags instead of the hard coded ones, then wrap in a for loop: 在python中,这样做但提供文件和标签的列表而不是硬编码的列表,然后换入for循环:

#!/usr/bin/env python
# encoding: utf-8
import re
f=open('fileWithHtml','r')

for line in f.readlines():
        line = re.sub('<p\s(.*)>[^<]', '<p>', line)
        print(line)

Most probably, this can be easily translated into JavaScript for nodejs 最有可能的是,这可以很容易地转换为nodejs的JavaScript

You could dynamically parse the the elements using a DOM (or SAX, depending on what you want to do) parser and remove all the style attributes met. 您可以使用DOM(或SAX,取决于您要执行的操作)动态解析元素,解析器并删除所有遇到的样式属性。

On JavaScript, you could use HTML DOM removeAttribute() Method. 在JavaScript上,您可以使用HTML DOM removeAttribute()方法。

<script>
  function myFunction()
  {
    document.getElementsByClassName("your div class")[0].removeAttribute("style"); 
};
</script>

I'm providing the client-side (browser) version as this answer came up when I googled remove HTML attributes : 我正在提供客户端(浏览器)版本,当我googled 删除HTML属性时,这个答案出现了:

// grab the element you want to modify
var el = document.querySelector('p');

// get its attributes and cast to array, then loop through
Array.prototype.slice.call(el.attributes).forEach(function(attr) {

    // remove each attribute
    el.removeAttribute(attr.name);
});

As a function: 作为一个功能:

function removeAttributes(el) {

    // get its attributes and cast to array, then loop through
    Array.prototype.slice.call(el.attributes).forEach(function(attr) {

        // remove each attribute
        el.removeAttribute(attr.name);
    });
}
$ = cheerio.load(htmlAsString);

const result = $("*")
 // specify each attribute to remove, "*" as wildcard does not work
.removeAttr("class")
.removeAttr("itemprop")
.html();
// if you also wanted to remove the inner text for some reason, comment out the previous .html() and use
//.text("")
//.toString();

console.log("result", result);

Here is another solution to this problem in vanilla JS:这是香草 JS 中此问题的另一种解决方案:

html.replace(/\s*\S*\="[^"]+"\s*/gm, "");

The script removes all attributes from a string named html using a simple regular expression.该脚本使用一个简单的正则表达式从名为html的字符串中删除所有属性。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM