简体   繁体   English

如何使用html解析和jsoup获取图像

[英]how to get image using html parsing with jsoup

I want get all images using html parsing with jsoup. 我想使用与jsoup的html解析来获取所有图像。 I use below code ; 我使用下面的代码;

Elements images = doc.select("img[src~=(?i)\\.(jpe?g)]");
        for (Element image : images) {

            //System.out.println("\nsrc : " + image.attr("src"));
            arrImageItem.add(image.attr("src"));

        }

I parse this method all images but i want to parse this url 我解析此方法所有图像,但我想解析此网址

http://tvrehberi.hurriyet.com.tr/images/742/403742.jpg http://tvrehberi.hurriyet.com.tr/images/742/403742.jpg

I want to parse beginnig of this url 我想解析此网址的beginnig

http://tvrehberi.hurriyet.com.tr/images .... .jpg http://tvrehberi.hurriyet.com.tr/images .... .jpg

How to get parse like this ? 如何获得这样的解析?

This will probably give you what you ask for, though your question is a bit unclear, so I can't be sure. 这可能会给您您要的内容,尽管您的问题还不清楚,所以我不确定。

public static void main(String args[]){

    Document doc = null;
    String url = "http://tvrehberi.hurriyet.com.tr";
    try {
        doc = Jsoup.connect(url).get();
    } catch (IOException e1) {
        e1.printStackTrace();
    }

    for (Element e : doc.select("img[src~=(?i)\\.(jpe?g)]")) {
        if(e.attr("src").startsWith("http://tvrehberi.hurriyet.com.tr/images")){
            System.out.println(e.attr("src"));
        }
    }
}

So, this might not be a very "clean" solution, but the if-statement will make sure it only prints out the image URL's from the /images/-directory on the server. 因此,这可能不是一个非常“干净”的解决方案,但是if语句将确保它仅从服务器上的/ images /-目录中打印出图像URL。

If I understood correctly, you want to retrieve the URL path up to a certain point and cut off the rest. 如果我理解正确,则希望检索到特定位置的URL路径,然后切断其余部分。 Do you even have to do that every time? 您甚至每次都要这样做吗? If you are only using URLs from the one site in your example, you could store "http://tvrehberi.hurriyet.com.tr/images" as a constant since it never changes. 如果在示例中仅使用来自一个站点的URL,则可以将"http://tvrehberi.hurriyet.com.tr/images"存储为常量,因为它永远不会更改。 If, on the other hand, you fetch URLs from many different sites, you can parse your URL as described here . 另一方面,如果您从许多不同的站点获取URL,则可以按此处所述解析URL。
Anyway, if you shared the purpose of parsing the URLs, we certainly could help you more. 无论如何,如果您具有解析URL的目的,那么我们当然可以为您提供更多帮助。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM