[英]get specific child using jsoup
i have this simple example to illustrate my problem: this is a html page test.html : 我有一个简单的例子来说明我的问题:这是一个html页面test.html:
<body>
<div class="partA">
part a
</div>
<script></script>
<div class="partB">
part b
</div>
<div class="partC">
part c
</div>
<div class="parthh">
<div>
part b 2
</div>
<div class="partD">
part d
</div>
</div>
</body>
and this is my code java : 这是我的代码java:
public static void main(String[] args) throws IOException {
Document doc = Jsoup.parse(new File("C:\\Users\\HC\\Desktop\\dataset\\test.html"), "UTF-8");
Elements el = doc.select("body > div:eq(1)");
System.out.println(el.toString());
}
the problem that tag 'script' or other tags came before my diserd tag(the second div in this case ) prevent the good execution of code , and the returned result is empty . 标签'script'或其他标签在我的diserd标签(在本例中为第二个div)之前出现的问题阻止了代码的良好执行,并且返回的结果为空。
please how can i ignore those undisered tags and get the specific one . 请我如何忽略这些未标记的标签并获得具体的标签。
You can remove all the script
tags from your HTML
: 您可以从
HTML
删除所有script
标签:
Document doc = Jsoup.parse(new File("C:\\Users\\HC\\Desktop\\dataset\\test.html"), "UTF-8");
Elements el = doc.select("script");
for (Element e : el) {
e.remove();
}
el = doc.select("body > div:eq(1)");
System.out.println(el.toString());
Now your doc won't contain that tag, and you'll get the desired output. 现在,您的文档将不再包含该标记,您将获得所需的输出。
选择器body > div:nth-of-type(2)
不是您想要的吗?
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.