简体   繁体   English

使用Python提取HTML

[英]HTML Extraction with Python

The issue being tackled is being unable to click and go to the next page on an HTML page. 解决的问题是无法单击并转到HTML页面的下一页。 An HTML page is being accessed which displays results after your search query. 正在访问一个HTML页面,该页面在搜索查询后显示结果。 At the bottom of the page, there is a line of numbers to select from the page of your query ie "1 2 3 4 next" - clicking "2" shows you the results on the second page. 在页面的底部,有一行数字可供您选择,即“ 1 2 3 4 next”-单击“ 2”会在第二页上显示结果。 If you are on a different page number ie 2 or 3, the line at the bottom looks like: "previous 1 2 3 4 next". 如果您使用的是不同的页码,即2或3,则底部的行看起来像:“上一页1 2 3 4下一页”。 I am using Python and Webdriver to click to the next page to scroll through my results. 我正在使用Python和Webdriver单击以转到下一页以滚动查看结果。 The first time I press it, it takes me to the next page. 第一次按下它,将带我进入下一页。 The SECOND time I click it it takes me to the previous page. 单击第二秒钟,它将带我到上一页。 Meaning I am stuck on the first two pages and cannot see results for 3 and 4. I noticed that the reason this was happening was because of the li class="arrow" tag being present twice in the HTML code. 意味着我停留在前两页上,看不到3和4的结果。我注意到发生这种情况的原因是因为li class =“ arrow”标记在HTML代码中出现了两次。 That when the second call was made, the first tag that appears is the one with the "arrow" class. 当进行第二次调用时,出现的第一个标签是带有“ arrow”类的标签。 How do I go about clicking this? 我该如何点击呢?

HTML Notes: - the "li" tag defines a list item HTML注意:-“ li”标记定义一个列表项

HTML Code: HTML代码:

BEFORE CLICKING NEXT: 单击下一步之前:

<div class="list">
<ul class="line">
<li class="current page"><a href>1</a></li>
<li><a href="/search_text=&&page=1">2</a></li>
<li><a href="/search_text=&&page=2">3</a></li>
<li><a href="/search_text=&&page=3">4</a></li>
<li class="arrow"><a href="/search_text=&&page=1">next</a></li>
</ul>
</div>

AFTER CLICKING "NEXT" HTML CODE looks like this: 单击“下一步” HTML代码后,如下所示:

<div class="list">
<ul class="line">
<li class="arrow"><a href="/search_text=&">previous</a></li>
<li><a href="/search_text=&">1</a></li>
<li class="current page"><a href>2</a></li>
<li><a href="/search_text=&&page=2">3</a></li>
<li><a href="/search_text=&&page=3">4</a></li>
<li class="arrow"><a href="/search_text=&&page=2">next</a></li>
</ul>
</div>

Python Code: Python代码:

chromedriver = "C:\temp\chromedriver.exe"
os.environ["webdriver.chrome.driver"] = chromedriver
driver = webdriver.Chrome(executable_path=r"C:\temp\chromedriver.exe")
driver.implicitly_wait(3)
driver.get(urlLink)


driver.find_element_by_css_selector("li.arrow").click() #Takes me to the next page
driver.find_element_by_css_selector("li.arrow").click() #Takes me to the previous page

.. ..

You can use the method driver. 您可以使用方法驱动程序。 find_element_by_link_text ('next') to find the element and, then, call .click() find_element_by_link_text ('next')查找元素,然后调用.click()

Alternatively you could add an ID to the next button and call: 或者,您可以将ID添加到下一个按钮并调用:

driver.find_element_by_id('whatever_id_you_use').click()

or categorize the next arrow and the previous arrow by adding a class to differentiate the two and call: 或通过添加一个类来区分下一个箭头和上一个箭头,以区分两者并调用:

driver.find_element_by_class_name('next_arrow').click()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM