简体   繁体   中英

Scraping Flickr with Selenium/Beautiful soup in Python - ABSWP

I'm going through Automate Boring Stuff with Python and I'm stuck at the chapter about downloading data from the internet. One of the tasks is download photos for a given keyword from Flickr.

I have a massive problem with scraping this site. I've tried BeautifulSoup (which I think is not appropriate in this case as it uses Javascript) and Selenium. Looking at the html I think that I should locate 'overlay' class. However no matter which option I use ( find_element_by_class_name , ...by_text , ...by_partial_text ) I am not able to find these elements (I get: ".

Could you please help me to clarify what I'm doing wrong? I'd be also grateful for any materials that could help me understadt such cases better. Thanks!

Here's my simple code:

import sys
search_keywords = sys.argv[1]
from selenium import webdriver
browser = webdriver.Firefox()

browser.get(f'https://www.flickr.com/search/?text={search_keywords}')
elems = browser.find_element_by_class_name("overlay")
print(elems)
elems.click()

Sample keywords I type in shell: "industrial design interior"

Are you getting any error message? With Selenium it's useful to surround your code in try/except blocks.

What are you trying to do exactly, download the photos? With a bit of re-writing

try:
     options = webdriver.ChromeOptions()
     #options.add_argument('--headless')
     driver = webdriver.Chrome(chrome_options = options)
     search_keywords = "cars"
     driver.get(f'https://www.flickr.com/search/?text={search_keywords}')
     time.sleep(1)

 except Exception as e:
     print("Error loading search results page" + str(e))

 try:
     elems = driver.find_element_by_class_name("overlay")
     print(elems)
     elems.click()
     time.sleep(5)
 except Exception as e:
     print(str(e))

Loads the page as expected and then clicks on the photo, taking us to This Page I would be able to help more if you could go into more detail of what you're wanting to accomplish.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM