简体   繁体   English

使用Jsoup库删除标签

[英]Removing tags using Jsoup library

I need to remove unwanted tags..can any one guide me how to work on it..here is my code 我需要删除不需要的标签..任何人都可以指导我如何进行操作..这是我的代码

public class Test {

    static String val = "<div class=vcard><div class=label>Category: </div><span class=category><a title=Chemicals Manufacturers href=http://www.gmdu.net/list-5-p1.html>Chemicals</a>- <a title=Textile Chemicals Manufacturers href=http://www.gmdu.net/list-5-273-p1.html>Textile Chemicals</a></span><br/><div class=label>Region: </div><span class=adr country-name><a title=Sri Lanka Manufacturers href=http://www.gmdu.net/loca-35-p1.html>Sri Lanka</a></span><span class=fn org>Haycolour (Pvt) Ltd</span><br/></div>";

    public static void main(String a[]) throws IOException
    {
         Document doc = Jsoup.parse(val);
            Elements labels = doc.select("div.vcard div.label");
            for (Element label : labels) {
                System.out.println(String.format("%s:%s", label.text().trim(),label.nextSibling().outerHtml()));
            }
    }
}

My Output: 我的输出:

Category::<span class="category"><a title="Chemicals" manufacturers="" href="http://www.gmdu.net/list-5-p1.html">Chemicals</a>- <a title="Textile" chemicals="" manufacturers="" href="http://www.gmdu.net/list-5-273-p1.html">Textile Chemicals</a></span>
Region::<span class="adr" country-name=""><a title="Sri" lanka="" manufacturers="" href="http://www.gmdu.net/loca-35-p1.html">Sri Lanka</a></span>

Expected output: 预期产量:

Category:Chemicals - Textile Chemicals
Region:Sri Lanka Haycolour (Pvt) Ltd

I found this easiest to do with jSoup's nextElementSibling 我发现这与jSoup的nextElementSibling最简单

I replaced your println statement with this: 我用以下代码替换了您的println语句:

System.out.println(String.format("%s:%s %s", label.text().trim(),label.nextElementSibling().text().trim(),
label.nextElementSibling().nextElementSibling().text().trim()));

Produced this output: 产生此输出:

Category::Chemicals- Textile Chemicals 
Region::Sri Lanka Haycolour (Pvt) Ltd

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM