I want to remove/edit several HTML tag in a file.
Minimal example: I have this input HTML file on my disk
<!DOCTYPE html>
<html clang="en">
<head>
<meta charset="utf-8">
<title>test</title>
<style>
.remove-tag { color: #FF0000; }
.remove-div { color: #0000FF; }
</style>
</head>
<body>
<p>Hello world!</p>
<div class="remove-tag">
<p>I just want to remove the open/close div tags</p>
</div>
<div class="remove-div">
<p>I want the remove the div and all its content</p>
</div>
</body>
</html>
I want to process it so that I get this
<!DOCTYPE html>
<html clang="en">
<head>
<meta charset="utf-8">
<title>test</title>
<style>
.remove-tag { color: #FF0000; }
.remove-div { color: #0000FF; }
</style>
</head>
<body>
<p>Hello world!</p>
<p>I just want to remove the open/close div tags</p>
</body>
</html>
What's the easiest/most straightforward way to do it in your opinion? I hope to be able to write some sort of script to locally run on a given file to get the output. Or have some software that does it given a list of rules to follow.
I'm quite confident with regex/sed/..., but using these tools is a big NO NO for playing with HTML tags (and can understand why).
I've read about javascript ( getElementsByClassName()
, ...). Made some preliminary steps with javascript, installing Node.js. I can't even open a document to retrieve the elements... Looks like I have to install/import jsdom. I'm kinda stuck...
Read about jQuery. Seen several commands examples, but I don't get how to run them on local files. In generl, I'm a completely noob about jQuery.
Read about HTML parsers. Python seems to have a HTML parser library that I can use to accomplish the task.
Also hoped for a HTML parser software; doesn't look like there is any.
Any other hints?
try this script:
<script>
var removeTag= document.getElementsByClassName('remove-tag');
for(var i=0; i<removeTag.length;i++){
var innerHTML = removeTag[i].innerHTML;
let div = document.createElement('div');
div.innerHTML = innerHTML;
insertAfter(div,removeTag[i]);
removeTag[i].remove();
}
var removeDiv= document.getElementsByClassName('remove-div');
for(var i=0; i<removeDiv.length;i++){
removeDiv[i].remove();
}
function insertAfter(newNode, existingNode) {
existingNode.parentNode.insertBefore(newNode, existingNode.nextSibling);
}
</script>
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.