简体   繁体   中英

Beautiful Soup image scraper problems

I get the following traceback:

Traceback (most recent call last):
  File "/home/ro/image_scrape_test.py", line 20, in <module>
    soup = BeautifulSoup(searched, "lxml")
  File "/usr/local/lib/python3.4/dist-packages/bs4/__init__.py", line 176, in __init__
    elif len(markup) <= 256:
TypeError: object of type 'NoneType' has no len()

This is my code so far:

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from bs4 import BeautifulSoup
import urllib

#searches google images
driver = webdriver.Firefox()
google_images = ("https://www.google.com/search?site=imghp&tbm=isch source=hp&biw=1366&bih=648&q=")
search_term = input("what is your search term")
searched = driver.get("{0}{1}".format(google_images, search_term))

def savepic(url):
    uri = ("/home/ro/image scrape/images/download.jpg")
    if url != "":
        urllib.urlretrieve(url, uri)

soup = BeautifulSoup(searched, "lxml")
soup1 = soup.content
images = soup1.find_all("a")

for image in images:
    savepic(image)

I'm starting out so i'd appreciate any tips on how I can improve my code. Thankyou

driver.get() loads a webpage in the browser and returns None which makes the searched variable to have a None value.

You probably meant to get the .page_source instead:

soup = BeautifulSoup(driver.page_source, "lxml")

Two additional points here:

  • you don't actually need BeautifulSoup here - you can locate the desired images with selenium using, for instance, driver.find_elements_by_tag_name()
  • I have not tested your code, but I think you would need to add additional Explicit Waits to make selenium wait for the page to load

searched is None . Apparently, the url you are using is invalid.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM