简体   繁体   English

Jsoup 选择表数据

[英]Jsoup selecting table data

For the life of me I can't figure out how to select the img src using jsoup the link ending in "51u1FaI-FHL._SL500_AA300_.jpg".对于我的一生,我无法弄清楚如何使用 jsoup 以“51u1FaI-FHL._SL500_AA300_.jpg”结尾的链接来 select img src。

I've tried multiple things but none have worked.我尝试了多种方法,但都没有奏效。 Any help?有什么帮助吗?

doc1 = Jsoup.connect("http://www.amazon.com/gp/product/B0051HDDO2?ie=UTF8&ref=mas_faad").timeout(20000).get();
Element table = doc1.select("table[class=productImageGrid]").first()
Iterator<Element> ite = table.select("td[height=300]").iterator();

Thanks, Cody谢谢, 科迪

<table style="text-align: center;" border="0" cellpadding="0" cellspacing="0" width="300"> 
  <tr> 
    <td id="prodImageCell" height="300" width="300" style="padding-bottom: 10px;"><img onclick="if(0 ){ async_openImmersiveView(event);} else {openImmersiveView(event);}" class="prod_image_selector" style="cursor:pointer;" onload="if (typeof uet == 'function') { uet('af'); }" **src="http://ecx.images-amazon.com/images/I/51u1FaI-FHL._SL500_AA300_.jpg"** id="prodImage"/><div id="prodImageCellInner" style="position: relative; height:0px; "><!--Comment for IE as it is empty div--></div></td> 
    <td id="prodVideoClick" style="display:none"></td> 
    <img id="loadingImage" src=http://g-ecx.images-amazon.com/images/G/01/ui/loadIndicators/loading-large_boxed._V192195297_.gif style="position: absolute;  z-index: 200; display:none"> 
 </tr> 
  <tr> 
    <td class="tiny" style="padding-bottom: 5px;">&nbsp;<span id="prodImageCaption" style="color: #666666; font-size: 10px;">Click for larger image and other views</span>&nbsp;</td> 
  </tr> 
 </table> 

@user793728: try this:- @user793728:试试这个:-

document = Jsoup.connect("http://www.amazon.com/gp/product/B0051HDDO2?ie=UTF8&ref=mas_faad").timeout(20000).get();

Elements elements =document.select(".prod_image_selector");
    for (Element element : elements){
        Attributes imageAttributes=element.attributes();
        for (Attribute attribute: imageAttributes){
            if(attribute.getKey().equals("src")){
            String imageURL=attribute.getValue();
            }
        }

    }

The issue here seems to be that Amazon is returning different HTML to jsoup than it is to your browser, based on the request UserAgent.这里的问题似乎是亚马逊根据请求 UserAgent 将不同的 HTML 返回到 jsoup,而不是返回到您的浏览器。

I set the UserAgent to a known browser, and selected the element using the #prodImage ID, and got the result OK.我将 UserAgent 设置为已知浏览器,并使用#prodImage ID 选择元素,结果正常。

Eg例如

Document doc = Jsoup.connect("http://www.amazon.com/gp/product/B0051HDDO2?ie=UTF8&ref=mas_faad")
        .timeout(20000)
        .userAgent("Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_7) AppleWebKit/534.30 (KHTML, like Gecko) Chrome/12.0.742.91 Safari/534.30")
        .get();
Element img = doc.select("#prodImage").first();
System.out.println(img.attr("src"));

Returns http://ecx.images-amazon.com/images/I/51u1FaI-FHL._SL500_AA300_.jpg返回http://ecx.images-amazon.com/images/I/51u1FaI-FHL._SL500_AA300_.jpg

To troubleshoot issues like this, I suggesst outputting doc.html() and looking at the retrieved, parsed HTML, as it can be different from the view-source HTML of your browser (as servers can return different HTML, and view-source shows before the HTML has been tidied and built into a DOM).为了解决此类问题,我建议输出doc.html()并查看检索到的、已解析的 HTML,因为它可能与浏览器的视图源 HTML 不同(因为服务器可以返回不同的 Z4C4AD5FCA2E7A3FAA44DBB 和视图源在 HTML 被整理并内置到 DOM 之前)。

Hope this helps!希望这可以帮助!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM