[英]Extract text and web links with the selenium WebDriver
我正在研究 selenium 並且我想從 Sympla 的事件中提取文本和鏈接,但是當我單擊“更多事件”按鈕時,我無法提取下一個事件,它總是從頁面中提取相同的初始事件.
完整的 class 便於復制。
public static void main(String[] args) throws InterruptedException {
WebDriverManager.firefoxdriver().setup();
WebDriver driver = new FirefoxDriver();
driver.manage().window().maximize();
driver.get("https://www.sympla.com.br/eventos?ts=online_mais-de-3-mil-eventos-online");
driver.manage().timeouts().implicitlyWait(10, TimeUnit.SECONDS);
// If have captcha, close the page and exit.
boolean captcha = driver.getPageSource().contains("Não sou um robô");
if (captcha == true) {
System.out.println("O Captcha apareceu, acabou a brincadeira!");
driver.close();
driver.quit();
}
// load more button
WebElement CarregarMais = driver.findElement(By
.xpath("//button[@id='more-events']"));
// Number of events counter
List<WebElement> eventos = (List<WebElement>) driver.findElements(By
.cssSelector("div.event-name.event-card"));
System.out.println("Number of links: " + eventos.size());
// Number of links counter
List<WebElement> eventos_link = (List<WebElement>) driver
.findElements(By.cssSelector("a.sympla-card.w-inline-block"));
// iterating over the button more events
for (int j = 0; j < eventos.size(); j++) {
CarregarMais.click();
@SuppressWarnings("deprecation")
WebDriverWait wait = new WebDriverWait(driver, 10);
WebElement element = wait.until(ExpectedConditions
.elementToBeClickable(By
.xpath("//button[@id='more-events']")));
// Iterating over event links
for (int i = 0; i < eventos_link.size(); i++) {
System.out.println(i + " " + eventos.get(i).getText() + " - "
+ eventos_link.get(i).getAttribute("href"));
Thread.sleep(500);
}
}
}
這是因為您不再閱讀鏈接。 每次單擊按鈕都會創建一個新頁面,因此您需要再次閱讀它們。
此外,您需要存儲最后獲取的鏈接。
因此,在等待按鈕再次可點擊后,您需要重新閱讀eventos
和eventos_link
。 也許您使用像lastFetchedLinkIndex
這樣的全局變量。
這將是我的方法(調整你的代碼):
WebDriverManager.firefoxdriver().setup();
WebDriver driver = new FirefoxDriver();
driver.manage().window().maximize();
driver.get("https://www.sympla.com.br/eventos?ts=online_mais-de-3-mil-eventos-online");
driver.manage().timeouts().implicitlyWait(10, TimeUnit.SECONDS);
// If have captcha, close the page and exit.
boolean captcha = driver.getPageSource().contains("Não sou um robô");
if (captcha == true) {
System.out.println("O Captcha apareceu, acabou a brincadeira!");
driver.close();
driver.quit();
}
// load more button
WebElement CarregarMais = driver.findElement(By
.xpath("//button[@id='more-events']"));
// Number of events counter
List<WebElement> eventos = (List<WebElement>) driver.findElements(By
.cssSelector("div.event-name.event-card"));
System.out.println("Number of links: " + eventos.size());
// Number of links counter
List<WebElement> eventos_link = (List<WebElement>) driver
.findElements(By.cssSelector("a.sympla-card.w-inline-block"));
int lastEventScraped = 0;
// iterating over the button more events
for (int j = 0; j < eventos.size(); j++) {
CarregarMais.click();
@SuppressWarnings("deprecation")
WebDriverWait wait = new WebDriverWait(driver, 10);
WebElement element = wait.until(ExpectedConditions
.elementToBeClickable(By
.xpath("//button[@id='more-events']")));
eventos = (List<WebElement>) driver.findElements(By
.cssSelector("div.event-name.event-card"));
eventos_link = (List<WebElement>) driver
.findElements(By.cssSelector("a.sympla-card.w-inline-block"));
// Iterating over event links
for (int i = lastEventScraped; i < eventos_link.size(); i++, lastEventScraped++) {
System.out.println(i + " " + eventos.get(i).getText() + " - "
+ eventos_link.get(i).getAttribute("href"));
Thread.sleep(500);
}
}
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.