简体   繁体   中英

Get absolute url to image with jsoup

I'm working on a web-scraper for a website but my current code only scrapes relative urls to images. How can I convert those urls to absolute ones?

Second problem: when I combine the link manually http://www.arena-offshore.com/iframe/list/../../res2.php?res=site/big/08032016130016552-GEMI-gözcü1.jpg&g=500&u=335 and open the link in a browser, I only see some sort of text file instead of the picture. Is it possible to get a direct link to the picture, that is displayed normally in a browser?

Current code:

Document doc;
String url = "http://www.arena-offshore.com/iframe/list/list-detail.php?category=1&page=&id=956&id=956";
try {
doc = Jsoup.connect(url)
.userAgent("Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.82 Safari/537.36")
.get();
Elements elements = doc.select("#u702_img");

for (Element element : elements) {
String src = element.attr("src");
System.out.println(src);
}
} catch (IOException e) {
e.printStackTrace();
}

Output

../../res2.php?res=site/big/08032016130016552-GEMI-gözcü1.jpg&g=500&u=335

From your current output, just remove res2.php?res= and ending parameters &g=500&u=335 :

You will get the direct link

http://www.arena-offshore.com/site/big/08032016130016552-GEMI-g%C3%B6zc%C3%BC1.jpg

The text file is the image. You can see that it is a jpg because the file starts with:

ÿØÿàJFIFÿþ>CREATOR: gd- jpeg v1.0 (using IJG JPEG v62)

When you save the text file in your browser (Right click > Save as...) and give the file the .jpg extension it will be rendered correctly.

You can take the image URL from your src output:

String baseUrl = "http://www.arena-offshore.com/";
String output = "../../res2.php?res=site/big/08032016130016552-GEMI-gözcü1.jpg&g=500&u=335";
int start = output.indexOf("=") + 1;
int end   = output.indexOf("&", start);
String imageUrl = baseUrl + output.substring(start, end); 
// Gives:
// http://www.arena-offshore.com/site/big/08032016130016552-GEMI-g%C3%B6zc%C3%BC1.jpg

Then you could download the image using jsoup:

byte[] bytes = Jsoup.connect(url).ignoreContentType(true).execute().bodyAsBytes();

Note that there is also the element.absUrl("src"); method in Jsoup to get the absolute URL of an image, although that may not work in your case since it points to a php page.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM