[英]Extract text and web links with the selenium WebDriver
I'm studying selenium and I want to extract the texts and links from Sympla's events, but when I click on the " more events " button, I can't extract the next events, it is always extracting the same initial events from the page.我正在研究 selenium 并且我想从 Sympla 的事件中提取文本和链接,但是当我单击“更多事件”按钮时,我无法提取下一个事件,它总是从页面中提取相同的初始事件.
Complete class for easy reproduction.完整的 class 便于复制。
public static void main(String[] args) throws InterruptedException {
WebDriverManager.firefoxdriver().setup();
WebDriver driver = new FirefoxDriver();
driver.manage().window().maximize();
driver.get("https://www.sympla.com.br/eventos?ts=online_mais-de-3-mil-eventos-online");
driver.manage().timeouts().implicitlyWait(10, TimeUnit.SECONDS);
// If have captcha, close the page and exit.
boolean captcha = driver.getPageSource().contains("Não sou um robô");
if (captcha == true) {
System.out.println("O Captcha apareceu, acabou a brincadeira!");
driver.close();
driver.quit();
}
// load more button
WebElement CarregarMais = driver.findElement(By
.xpath("//button[@id='more-events']"));
// Number of events counter
List<WebElement> eventos = (List<WebElement>) driver.findElements(By
.cssSelector("div.event-name.event-card"));
System.out.println("Number of links: " + eventos.size());
// Number of links counter
List<WebElement> eventos_link = (List<WebElement>) driver
.findElements(By.cssSelector("a.sympla-card.w-inline-block"));
// iterating over the button more events
for (int j = 0; j < eventos.size(); j++) {
CarregarMais.click();
@SuppressWarnings("deprecation")
WebDriverWait wait = new WebDriverWait(driver, 10);
WebElement element = wait.until(ExpectedConditions
.elementToBeClickable(By
.xpath("//button[@id='more-events']")));
// Iterating over event links
for (int i = 0; i < eventos_link.size(); i++) {
System.out.println(i + " " + eventos.get(i).getText() + " - "
+ eventos_link.get(i).getAttribute("href"));
Thread.sleep(500);
}
}
}
It's because you don't read the links again.这是因为您不再阅读链接。 With every click on the button a new page is created, so you need to read them again.
每次单击按钮都会创建一个新页面,因此您需要再次阅读它们。
Furthermore you would need to store the last fetched link.此外,您需要存储最后获取的链接。
So after waiting for the button to be clickable again you need to reread eventos
and eventos_link
.因此,在等待按钮再次可点击后,您需要重新阅读
eventos
和eventos_link
。 And maybe you use a global variable like lastFetchedLinkIndex
.也许您使用像
lastFetchedLinkIndex
这样的全局变量。
This would be my approach (adjusted your code):这将是我的方法(调整你的代码):
WebDriverManager.firefoxdriver().setup();
WebDriver driver = new FirefoxDriver();
driver.manage().window().maximize();
driver.get("https://www.sympla.com.br/eventos?ts=online_mais-de-3-mil-eventos-online");
driver.manage().timeouts().implicitlyWait(10, TimeUnit.SECONDS);
// If have captcha, close the page and exit.
boolean captcha = driver.getPageSource().contains("Não sou um robô");
if (captcha == true) {
System.out.println("O Captcha apareceu, acabou a brincadeira!");
driver.close();
driver.quit();
}
// load more button
WebElement CarregarMais = driver.findElement(By
.xpath("//button[@id='more-events']"));
// Number of events counter
List<WebElement> eventos = (List<WebElement>) driver.findElements(By
.cssSelector("div.event-name.event-card"));
System.out.println("Number of links: " + eventos.size());
// Number of links counter
List<WebElement> eventos_link = (List<WebElement>) driver
.findElements(By.cssSelector("a.sympla-card.w-inline-block"));
int lastEventScraped = 0;
// iterating over the button more events
for (int j = 0; j < eventos.size(); j++) {
CarregarMais.click();
@SuppressWarnings("deprecation")
WebDriverWait wait = new WebDriverWait(driver, 10);
WebElement element = wait.until(ExpectedConditions
.elementToBeClickable(By
.xpath("//button[@id='more-events']")));
eventos = (List<WebElement>) driver.findElements(By
.cssSelector("div.event-name.event-card"));
eventos_link = (List<WebElement>) driver
.findElements(By.cssSelector("a.sympla-card.w-inline-block"));
// Iterating over event links
for (int i = lastEventScraped; i < eventos_link.size(); i++, lastEventScraped++) {
System.out.println(i + " " + eventos.get(i).getText() + " - "
+ eventos_link.get(i).getAttribute("href"));
Thread.sleep(500);
}
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.