简体   繁体   中英

HTML Extraction with Python

The issue being tackled is being unable to click and go to the next page on an HTML page. An HTML page is being accessed which displays results after your search query. At the bottom of the page, there is a line of numbers to select from the page of your query ie "1 2 3 4 next" - clicking "2" shows you the results on the second page. If you are on a different page number ie 2 or 3, the line at the bottom looks like: "previous 1 2 3 4 next". I am using Python and Webdriver to click to the next page to scroll through my results. The first time I press it, it takes me to the next page. The SECOND time I click it it takes me to the previous page. Meaning I am stuck on the first two pages and cannot see results for 3 and 4. I noticed that the reason this was happening was because of the li class="arrow" tag being present twice in the HTML code. That when the second call was made, the first tag that appears is the one with the "arrow" class. How do I go about clicking this?

HTML Notes: - the "li" tag defines a list item

HTML Code:

BEFORE CLICKING NEXT:

<div class="list">
<ul class="line">
<li class="current page"><a href>1</a></li>
<li><a href="/search_text=&&page=1">2</a></li>
<li><a href="/search_text=&&page=2">3</a></li>
<li><a href="/search_text=&&page=3">4</a></li>
<li class="arrow"><a href="/search_text=&&page=1">next</a></li>
</ul>
</div>

AFTER CLICKING "NEXT" HTML CODE looks like this:

<div class="list">
<ul class="line">
<li class="arrow"><a href="/search_text=&">previous</a></li>
<li><a href="/search_text=&">1</a></li>
<li class="current page"><a href>2</a></li>
<li><a href="/search_text=&&page=2">3</a></li>
<li><a href="/search_text=&&page=3">4</a></li>
<li class="arrow"><a href="/search_text=&&page=2">next</a></li>
</ul>
</div>

Python Code:

chromedriver = "C:\temp\chromedriver.exe"
os.environ["webdriver.chrome.driver"] = chromedriver
driver = webdriver.Chrome(executable_path=r"C:\temp\chromedriver.exe")
driver.implicitly_wait(3)
driver.get(urlLink)


driver.find_element_by_css_selector("li.arrow").click() #Takes me to the next page
driver.find_element_by_css_selector("li.arrow").click() #Takes me to the previous page

..

You can use the method driver. find_element_by_link_text ('next') to find the element and, then, call .click()

Alternatively you could add an ID to the next button and call:

driver.find_element_by_id('whatever_id_you_use').click()

or categorize the next arrow and the previous arrow by adding a class to differentiate the two and call:

driver.find_element_by_class_name('next_arrow').click()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM