带表的非结构化html页面的jsoup

Question

i'm trying to get the main img from this url , here what i tried so far : 我正在尝试从此URL获取主要img，这是到目前为止我尝试过的：

Document doc = null;
    try {
        doc = Jsoup.connect(url).get();
    } catch (IOException e) {
        e.printStackTrace();
    }

    Element table = doc.select("center").get(1);
    Elements rows = table.select("table[width=970]");
    for (int i = 0; i < rows.size(); i++) {
        Element row = rows.get(1);
        Elements cols = row.select("table[width=634]");
        for (int j = 0; j < cols.size(); j++) {
            Element row1 = rows.get(1);
            Elements cols1 = row1.select("table[width=600]");
            for (int k = 0; k < cols1.size(); k++){
                Element row0 = rows.first();
                Elements cols0 = row0.select("td");
                for (Element image : cols0) {
                    String image2 = image.absUrl("src").toString();
                    Log.i("tanja7 ", "pic  " + image2);
                }
            }
        }
    }

this is the unstructured html page (i don't know how to copy the html code) 这是非结构化的html页面（我不知道如何复制html代码） What i'm doing wrong? 我做错了什么？

Answer 1

It seems that you are expecting the inner elements as result of a JSoup select method call. 似乎您期望内部元素是JSoup select方法调用的结果。 That is not right - you get the elements that match the selector within the "search scope", which is given by the Element(s)/document class instance from which you call select . 那是不对的-您会在“搜索范围”内获得与选择器匹配的元素，该范围由调用select的Element / s类实例给出。 So, if you want to get all table elements of the document you do doc.select("table") . 因此，如果要获取文档的所有表元素，请执行doc.select("table") 。 This gives you not the rows, but the tables. 这给您的不是行，而是表。 Maybe you did understand this before, but your variable naming suggests otherwise. 也许您以前确实了解过，但是您的变量命名却暗示了其他情况。

Anyway, here is a selector that works. 无论如何，这是一个有效的选择器。 It will get all img elements that are (not necessarily direct) children of a table that has the attribute width=600 and is within another table of the document. 它将获得所有img元素，这些元素是（不一定是直接的）表的子元素，该属性的width=600 ，并且在文档的另一个表中。

Elements imgEls = doc.select("table table[width=600] img");
System.out.println(imgEls.first().absUrl("src"));

You say the html is not structured, so you might want to check if the relevant images are really always inside two tables as specified. 您说html不是结构化的，因此您可能要检查相关图像是否确实始终位于指定的两个表中。

update: if you are using a mobile device make sure to add: 更新：如果您使用的是移动设备，请确保添加：

doc = Jsoup.connect(url).userAgent("Mozilla").get();

带表的非结构化html页面的jsoup

问题描述

1 个解决方案

解决方案1
1 已采纳 2015-12-04 13:49:39

带表的非结构化html页面的jsoup

问题描述

1 个解决方案

解决方案1 1 已采纳 2015-12-04 13:49:39

解决方案1
1 已采纳 2015-12-04 13:49:39