简体   繁体   中英

How to save an image from an HTML webpage with JSoup

I am trying to use JSoup to scrape the poster image from an IMDb link, and save so that it can be used by my program later. This is what I have so far:

import java.io.IOException;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Attributes;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;

public class JSoupTest
{

    public static void main(String[] args)
    {

        String address = "https://www.imdb.com/title/tt1270797/";
        try
        {
            Document doc = Jsoup.connect(address).get();
            Element link = doc.select().select();
        }
        catch (IOException e)
        {
            // Auto-generated catch block
            e.printStackTrace();
        }
    }

}

Now, I know the image is under a div class named "poster", but I cannot find out how to extract it. Please bear with me, as I have no prior experience with JSoup. Thanks a lot.

I've been using JSoup for awhile. But I've never tried to download an image from a HTML source.

After getting document as you did above, you'll get the div you want, by using:

Elements divs = doc.getElementsByClass("poster");

The code above will return all Elements with 'poster' class.

If you are sure there's only one div named 'poster' you can do:

Element poster = divs.first();

If you aren't sure of that, you'll need to find a way to differentiate that div from the others.

Now, that you have your 'poster' div, you can get the link inside it, by doing:

Elements image = poster.getElementsByTag("a");

The code above will return all links inside 'poster' div. As we did above, if you're sure there's only one link inside 'poster' div, you can do:

Element downloadImage = image.first();

Now, you have the link for the image you want.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM