简体   繁体   中英

Selenium (Python) Finding all selectable windows on a webpage

So im trying to parse out all of the href links on a page here: https://data-wake.opendata.arcgis.com/datasets but Ive noticed that none of the links im looking for are returning from my python code which is here:

driver = webdriver.PhantomJS("C:\Users\Jlong\Desktop\phantomjs.exe")
driver.get(r"https://data-wake.opendata.arcgis.com/datasets")
pagesource = driver.page_source
bsobj = BeautifulSoup(pagesource,'lxml')
for line in bsobj.find_all('a'):
    print(line.get('href'))

Here is a snipit of the html from chromes inspect: Html Inspect

the expected result would be to return something like the following:

"/datasets/wakeforestnc::state-system-streets"

I have also noticed that there is something called Ember application.js running on the page and I think that maybe preventing me from accessing the href attributes that are deeply nested in the main ember tag. IM not familair with ember or how to parse complex pages like this, any help would be greatly appreciated!

Ember.js is used to build SPA (Single Page Applications) and, in general, is client-side rendered.

My guess is that your code is searching for all anchors after the page loads, but before the SPA renders.

Your code needs to wait for the Ember application to render, perhaps wait until the body element has a class ember-application .

I believe you are getting the page_source before the front-end renders it.

I got those links via chromedriver (should be the same for phantomjs) by adding a simple wait before accessing the page_source :

from selenium import webdriver
from bs4 import BeautifulSoup
import time

driver = webdriver.Chrome()
driver.get("https://data-wake.opendata.arcgis.com/datasets")
time.sleep(5)
soup = BeautifulSoup(driver.page_source,'lxml')
for line in soup.find('ul', {'id':'search-results'}).find_all('a', {'class': 'result-name ember-view'}):
    print(line.get('href'))

Output :

/datasets/tofv::fuquay-varina-utility-as-built-drawings
/datasets/tofv::private-sewer-manhole
/datasets/tofv::fuquay-varina-town-development
/datasets/tofv::blowoff-valve
/datasets/tofv::fuquay-varina-zoning
/datasets/tofv::drainage-point
/datasets/tofv::gravity-sewer-line
/datasets/tofv::water-meter-vault
/datasets/tofv::fuquay-varina-sidewalks
/datasets/tofv::water-line

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM