简体   繁体   中英

How to scrape multiple links with selenium after manual login?

I am trying to automatically collect articles from a database which first requires me to login.

I have written the following code using selenium to open up the search results page, then wait and allow me to login. That works, and it can get the links to each item in the search results.

I want to then continue use selenium to continue to visit each of the links in the search results and collect the article text

browser = webdriver.Firefox()
browser.get("LINK")
time.sleep(60)
lnks = browser.find_elements_by_tag_name("a")[20:40]
for lnk in lnks:
    link = lnk.get_attribute('href')
    print(link)

I can't get any further. How should I then make it visit these links in turn and get the text of the articles for each one?

I tried to add driver.get(link) to the for loop, I got the 'selenium.common.exceptions.StaleElementReferenceException'

On the request of the database owner, I have removed the screenshots previously posted in this post, as well as information about the database. I would like to delete the post completely, but am unable to do so.

You need to seek bs4 tutroials, but here is starter

html_source_code = Browser.execute_script("return document.body.innerHTML;")
soup = bs4.BeautifulSoup(html_source_code, 'lxml')
links = soup.find_all('what-ever-the-html-code-is')
for l in links:
    print(l['href'])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM