简体   繁体   English

Java:如何使用Selenium从Amazon抓取图像?

[英]Java: How to Scrape Images from Amazon with Selenium?

I'm trying to scrape the 6 images on left side of page from this URL on Amazon using Selenium WebDriver: 我正在尝试使用Selenium WebDriver从Amazon上的此URL刮取页面左侧的6张图像:

http://www.amazon.com/EasyAcc%C2%AE-10000mAh-Brilliant-Smartphone-Bluetooth/dp/B00H9BEC8E http://www.amazon.com/EasyAcc%C2%AE-10000mAh-Brilliant-Smartphone-Bluetooth/dp/B00H9BEC8E

However, whatever I try causes an error. 但是,无论我怎么尝试都会导致错误。 What I've tried so far: 到目前为止,我已经尝试过:

  1. I tried scraping images directly using XPATH and then extracting src using "getAttributes" method. 我尝试直接使用XPATH刮取图像,然后使用“ getAttributes”方法提取src。 For example, for the 1st image on page the XPATH is: 例如,对于页面上的第一个图像,XPATH为:

    .//*[@id='a-autoid-2']/span/input .//*[@id='a-autoid-2']/span/input

so I tried the following: 所以我尝试了以下方法:

  String path1 = ".//*[@id='a-autoid-2']/span/input";
        String url = "http://www.amazon.com/EasyAcc%C2%AE-10000mAh-Brilliant-Smartphone-Bluetooth/dp/B00H9BEC8E";
        WebDriver driver = new FirefoxDriver();
        driver.get(url);
  WebElement s;
        s = driver.findElement(By.xpath(path1));
        String src;
        src = s.getAttribute("src");
        System.out.println(src);

But I'm unable to find source. 但是我找不到来源。

Note: This problem occurs only when scraping images from certain types of products. 注意:仅当从某些类型的产品中刮取图像时,才会出现此问题。 For example, I can easily scrape images from this product using Selenium: 例如,我可以使用Selenium轻松从该产品刮取图像:

http://www.amazon.com/Ultimate-Unification-Diet-Health-Disease/dp/0615797806/ http://www.amazon.com/zh-CN/Ultimate-Unification-Diet-Health-Disease/dp/0615797806/

import java.util.List;

import org.openqa.selenium.By;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.WebElement;
import org.openqa.selenium.firefox.FirefoxDriver;

public class mytest {

    public static void main(String[] args) {
        // TODO Auto-generated method stub



        String path = ".//*[@id='imgThumbs']/div[2]/img";

        String url = "http://www.amazon.com/Ultimate-Unification-Diet-Health-Disease/dp/0615797806/";
        WebDriver driver = new FirefoxDriver();
        driver.get(url);


        WebElement s;
        s = driver.findElement(By.xpath(path));
        String src;
        src = s.getAttribute("src");
        System.out.println(src);

        driver.close();


    }
}

This code works flawlessly. 该代码可以完美地工作。 It is only when scraping certain products that there seems to be no way around it. 仅当刮擦某些产品时,似乎无法解决它。

  1. I tried clicking on image which causes an iframe to open but I'm unable to scrape images from this iframe either, even after switching to iframe with: 我尝试单击导致iframe打开的图像,但是即使使用以下方法切换到iframe,也无法从该iframe抓取图像:

    driver.switchTo().frame(IFRAMEID); driver.switchTo()。frame(IFRAMEID);

I know I can use the "screenshot" method but I'm wondering if there's a way to scrape the images directly? 我知道我可以使用“屏幕截图”方法,但是我想知道是否有一种直接刮取图像的方法吗?

Thanks 谢谢

Try this code 试试这个代码

    String path = "//div[@id='imageBlock_feature_div']//span/img";

    String url = "http://rads.stackoverflow.com/amzn/click/0615797806";
    WebDriver driver = new FirefoxDriver();
    driver.get(url);

    List<WebElement> srcs;
    srcs = driver.findElements(By.xpath(path));

    for(WebElement src : srcs) {
        System.out.println(src.getAttribute("src"));
    }

    driver.close();

Result 结果

2015-01-23 12:36:14 [main]-[INFO] Opened url: http://rads.stackoverflow.com/amzn/click/B00H9BEC8E
http://ecx.images-amazon.com/images/I/41cOP3mFX3L._SX38_SY50_CR,0,0,38,50_.jpg
http://ecx.images-amazon.com/images/I/51YkMhRXqcL._SX38_SY50_CR,0,0,38,50_.jpg
http://ecx.images-amazon.com/images/I/51nSbXF%2BCTL._SX38_SY50_CR,0,0,38,50_.jpg
http://ecx.images-amazon.com/images/I/31s%2B31F%2BQmL._SX38_SY50_CR,0,0,38,50_.jpg
http://ecx.images-amazon.com/images/I/41FmTOJEOOL._SX38_SY50_CR,0,0,38,50_.jpg
http://ecx.images-amazon.com/images/I/41U6qpLJ07L._SX38_SY50_CR,0,0,38,50_.jpg

However, to get Amazon Images, I suggest you to try Amazon API https://affiliate-program.amazon.com/gp/advertising/api/detail/main.html 但是,要获取Amazon Images,建议您尝试使用Amazon API https://affiliate-program.amazon.com/gp/advertising/api/detail/main.html

It's much better. 好多了

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM