简体   繁体   中英

How to browse over a page using PhantomJS and Selenium

I got some DIV elements on a web page. Totally there are abound 30 DIV blocks of the following similar structure:

 <div class="w-dyn-item"> <a href="/project/soft" class="jobs-wrapper no-line w-inline-block w-clearfix"> <div class="jobs-client"> <img data-qazy="true" src="https://global.com/test.jpg" alt="Soft" class="image-9"> <div style="background-color:#cd7f32" class="job-time">Level 1</div> </div> <div class="jobs-content w-clearfix"> <div class="w-clearfix"> <div class="text-block-19 w-condition-invisible">PROMO</div> <h3 class="job-title">Soft</h3> <img height="30" data-qazy="true" src="https://global.com/test.jpg" alt="Soft" class="image-15 w-hidden-main w-hidden-medium w-hidden-small"></div> <div class="div-block w-clearfix"> <div class="text-block-4">Italy</div> <div class="text-block-4 w-hidden-small w-hidden-tiny">AMB</div> <div class="text-block-4 w-hidden-small w-hidden-tiny">GTL</div> <div class="text-block-13">January 10, 2017</div><div class="text-block-14">End date:</div></div><div class="space small"></div><p class="paragraph-3">Text text text</p></div> </a> </div> 

I am trying to access a href and click on the link. However, the problem is that I cannot use find_element_by_link_text , because the link text does not exist. Is it possible to access a href by class class="jobs-wrapper no-line w-inline-block w-clearfix" ? When I used find_element_by_class_name , I got the error Message: {"errorMessage":"Compound class names not permitted","request

from selenium import webdriver
driver = webdriver.PhantomJS()
driver.set_window_size(1120, 550)
driver.get("https://myurl.com/")
driver.find_element_by_link_text("//a href").click()
print driver.current_url
driver.quit()

If your only requirement is to click the a tag inside a tag with w-dyn-item class, then you could do it like this:

driver.find_element_by_class_name("w-dyn-item").find_element_by_tag_name("a").click()


To iterate over all tags with w-dyn-item class -> click the a inside them -> do something -> go back, do this:

tags = driver.find_elements_by_class_name("w-dyn-item")
for i in range(len(tags)):
    tag = driver.find_elements_by_class_name("w-dyn-item")[i]
    tag.find_element_by_tag_name("a").click()
    # Do what you want inside the page...
    driver.back()

The key here is of course to go back to the root page after you're done with the inner page.

The error you're getting is because Selenium's find_element_by_class_name does not support multiple classes.
Use a CSS selector with find_elements_by_css_selector instead:

driver.find_elements_by_css_selector('.jobs-wrapper.no-line.w-inline-block.w-clearfix')

Will select all tags with your wanted class, then you can iterate over them and use click() or any other wanted action

EDIT

Following your comment, new snippet to help you do what you want:

result = {}
urls = []
# 'elements' is a the list you previously obtained using the css selector
for element in elements:
    urls.append(element.get_attribute('href'))


# Now you can iterate over all extracted hrefs:
for url in urls:
    url_data = {}
    driver.get(url)
    field1 = driver.find_element_by_id('wanted_id_1')
    url_data['field1'] = field1
    field2 = driver.find_element_by_id('wanted_id_2')
    url_data['field2'] = field2
    result[url] = url_data

Now, result is a dictionary in a structure similar to what you wanted.

Note that field1 and field2 are of type WebElement so you'll probably need to do something with them first (extract attribute, text, etc).

Also, on personal note, Look into the requests together with BeautifulSoup , they might be a way better fit than Selenium for this or future similar cases.

要访问并单击a href ,可以使用以下代码行:

driver.find_element_by_xpath("//div[@class='w-dyn-item']/a[@href='/project/soft']").click()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM