简体   繁体   English

Jsoup解析包含span标签的HTML

[英]Jsoup parse HTML including span tags

I have a HTML with the following format 我有以下格式的HTML

<article class="cik" id="100">
<a class="ci" href="/abc/1001/STUFF">
              <img alt="Micky Mouse" src="/images/1001.jpg" />
              <span class="mick vtEnabled"></span>

</a>

<div>
         <a href="/abc/1001/STUFF">Micky Mouse</a>
         <span class="FP">$88.00</span>&nbsp;&nbsp;<span class="SP">$49.90</span>

</div>
</article>

In the above code the tag inside article has a span class="mick vtEnabled" with no lable. 在上面的代码中,文章内部的标签的跨度为class =“ mick vtEnabled”,没有标签。 I want to check if this span tag with the class name specified is present within the article tag. 我想检查在文章标签中是否存在具有指定类名的span标签。 How do i do that? 我怎么做? I tried select("> a[href] > span.mick vtEnabled") and checked the size..it remains 0 for all the article tags irrespective if its set or not. 我尝试了select(“> a [href]> span.mick vtEnabled”)并检查了大小。所有文章标签的大小均为0,无论是否设置。 any inputs? 有输入吗?

Starting from individual article tags would be good: 从单个article标签开始会很好:

final String test = "<article class=\"cik\" id=\"100\"><a class=\"ci\" href=\"/abc/1001/STUFF\"><img alt=\"Micky Mouse\" src=\"/images/1001.jpg\" /></a><div><a href=\"/abc/1001/STUFF\">Micky Mouse</a><span class=\"FP\">$88.00</span>&nbsp;&nbsp;<span class=\"SP\">$49.90</span></div></article>";
final Elements articles = Jsoup.parse(test).select("article");
for (final Element article : articles) {
    final Elements articleImages = article.select("> a[href] > img[src]");
    for (final Element image : articleImages) {
        System.out.println(image.attr("src"));
    }
    final Elements articleLinks = article.select("> div > a[href]");
    for (final Element link : articleLinks) {
        System.out.println(link.attr("href"));
        System.out.println(link.text());
    }
    final Elements articleFPSpans = article.select("> div > span.FP");
    for (final Element span : articleFPSpans) {
        System.out.println(span.text());
    }
}
    final Elements articleSPSpans = article.select("> div > span.SP");
    for (final Element span : articleSPSpans) {
        System.out.println(span.text());
    }
}

This prints: 打印:

/images/1001.jpg
/abc/1001/STUFF
Micky Mouse
$88.00
$49.90

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM