简体   繁体   中英

Jsoup getting background image path from css

I am looking for all of the images on a given website.

For this purpose i need to find the ones that are within the css for example:

   .gk-crop {
    background-image: url("../images/style1/g_rss-2.png");
}

Now my question is how can i get all of these urls with JSoup?

so far ive tried the following:

    Document doc = Jsoup.connect(url).get();
    Elements imagePath = doc.select("[src]");
    imagePath.select("*[style*='background-image']");

but so far no luck.

Does anyone know how i can acheive it?

Jsoup doesn't parse css files.

Have a look at this to know what Jsoup is responsible for.

You need a separate css parser to extract url from css files. Have a look at this

Just like Niranjan mentioned, Jsoup is not for parsing CSS but XML. If you really need to extract some images from CSS, you will need to use some some 3rd party library for that purpose OR write simple regex for grabbing URLs from CSS file - its still plain text isn't it? This is not flexible resolution to your problem, but it would be the fastest one:)

If you want to select the URL's of all the images on a website you can select all the image tags and then get the absolute URL's.

Example:

String html = "http://www.bbc.co.uk";
Document doc = Jsoup.connect(html).get();

Elements titles = doc.select("img");

for (Element e : titles) {
    System.out.println(e.absUrl("src"));
}

which will grab all the <img> elements and present it, such as

http://sa.bbc.co.uk/bbc/bbc/s?name=SET-COUNTER&pal_route=index&ml_name=barlesque&app_type=web&language=en-GB&ml_version=0.16.1&pal_webapp=wwhp&blq_s=3.5&blq_r=3.5&blq_v=default-worldwide
http://static.bbci.co.uk/frameworks/barlesque/2.50.2/desktop/3.5/img/blq-blocks_grey_alpha.png
http://static.bbci.co.uk/frameworks/barlesque/2.50.2/desktop/3.5/img/blq-search_grey_alpha.png
http://news.bbcimg.co.uk/media/images/69139000/jpg/_69139104_69139103.jpg
http://news.bbcimg.co.uk/media/images/69134000/jpg/_69134575_waynerooney1.jpg

If you only want the .JPG files, tell the selector that by including

Elements titles = doc.select("img[src$=.jpg]");

which result in only parsing the .JPG-urls.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM