简体   繁体   中英

Removing/editing HTML tags from local file

I want to remove/edit several HTML tag in a file.

Minimal example: I have this input HTML file on my disk

<!DOCTYPE html>
<html clang="en">

<head>
<meta charset="utf-8">
<title>test</title>
<style>
.remove-tag { color: #FF0000; }
.remove-div { color: #0000FF; }
</style>
</head>

<body>

<p>Hello world!</p>

<div class="remove-tag">
<p>I just want to remove the open/close div tags</p>
</div>

<div class="remove-div">
<p>I want the remove the div and all its content</p>
</div>

</body>

</html>

I want to process it so that I get this

<!DOCTYPE html>
<html clang="en">

<head>
<meta charset="utf-8">
<title>test</title>
<style>
.remove-tag { color: #FF0000; }
.remove-div { color: #0000FF; }
</style>
</head>

<body>

<p>Hello world!</p>

<p>I just want to remove the open/close div tags</p>

</body>

</html>

What's the easiest/most straightforward way to do it in your opinion? I hope to be able to write some sort of script to locally run on a given file to get the output. Or have some software that does it given a list of rules to follow.

I'm quite confident with regex/sed/..., but using these tools is a big NO NO for playing with HTML tags (and can understand why).

I've read about javascript ( getElementsByClassName() , ...). Made some preliminary steps with javascript, installing Node.js. I can't even open a document to retrieve the elements... Looks like I have to install/import jsdom. I'm kinda stuck...

Read about jQuery. Seen several commands examples, but I don't get how to run them on local files. In generl, I'm a completely noob about jQuery.

Read about HTML parsers. Python seems to have a HTML parser library that I can use to accomplish the task.

Also hoped for a HTML parser software; doesn't look like there is any.

Any other hints?

try this script:

<script>

   var removeTag= document.getElementsByClassName('remove-tag'); 
   for(var i=0; i<removeTag.length;i++){
      var innerHTML =  removeTag[i].innerHTML; 
      let div = document.createElement('div');
      div.innerHTML = innerHTML;
      insertAfter(div,removeTag[i]);
      removeTag[i].remove();
    
      }


         var removeDiv= document.getElementsByClassName('remove-div');  
         for(var i=0; i<removeDiv.length;i++){
            removeDiv[i].remove();
         }



      function insertAfter(newNode, existingNode) {
           existingNode.parentNode.insertBefore(newNode, existingNode.nextSibling);
        }
    </script>

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM