简体   繁体   中英

How to abandon part of webpage with Jsoup?

I am currently using Jsoup to parse a html. The code is quite simple:

Document doc = null;
    try{
        doc = Jsoup.connect(link).get();    
    }
    catch (Exception e) {
        //System.out.println("Some error occured.");
        textView.setText(e.getMessage());
    }

It do gives me the webpage I want, later I can extract the data I need from that webpage with it's getElementsByTag method and so on. However, I only want to use part of the webpage, for example, I wish to abandon everything after <. -- / foo --> in my webpage, (Actually It's does not have blank between < and.? but I can't type that here,) Is there any way of abandon the webpage after that string and get the new Document with only the part I want, I checked the cookbook. but it seems only process the webpage in it's structure. so I am not quite sure is it OK to do something like string remove. Thanks for your reading.

You can use Document doc = Jsoup.parse(html) where HTML is a page HTML. Ie take HTML first by

   Connection connect = Jsoup.connect(url);
   Connection.Response response = connect.execute();
   String html = response.body();

then do whatever operations you need (eg cut HTML after marker, but add necessary closing HTML tags), then

   Document doc = Jsoup.parse(html)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM