简体   繁体   English

如何使用jsoup从整个html页面中删除特定标记

[英]How to remove a specific tag from the entire html page using jsoup

i'm using jsoup 1.7.3 to edit some html files. 我正在使用jsoup 1.7.3编辑一些html文件。

what i need is to remove the following tags from the html file : 我需要从html文件中删除以下标签:

<meta name="GENERATOR" content="XXXXXXXXXXXXXX">
<meta name="CREATED" content="0;0">
<meta name="CHANGED" content="0;0">

As you see its the tag, how can i do that, here what i've tried so far : 当您看到它的标签时,我该怎么做,这是到目前为止我尝试过的:

//im pretty sure that the <meta> tag is nested in the <header>
but removing the whole  header is bad practice.

Document docsoup = Jsoup.parse(htmlin);
docsoup.head().remove();

what do you suggest ? 你有什么建议?

I recommend you use Jsoup selectors , for example 例如,我建议您使用Jsoup选择器

Document document = Jsoup.parse(html);
Elements selector = document.select("meta[name=GENERATOR]");

for (Element element : selector) {
    element.remove();
}

doc.html(); // returns String html with elements removed

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM