[英]Beautiful Soup image scraper problems
我得到以下回溯:
Traceback (most recent call last):
File "/home/ro/image_scrape_test.py", line 20, in <module>
soup = BeautifulSoup(searched, "lxml")
File "/usr/local/lib/python3.4/dist-packages/bs4/__init__.py", line 176, in __init__
elif len(markup) <= 256:
TypeError: object of type 'NoneType' has no len()
到目前为止,这是我的代码:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from bs4 import BeautifulSoup
import urllib
#searches google images
driver = webdriver.Firefox()
google_images = ("https://www.google.com/search?site=imghp&tbm=isch source=hp&biw=1366&bih=648&q=")
search_term = input("what is your search term")
searched = driver.get("{0}{1}".format(google_images, search_term))
def savepic(url):
uri = ("/home/ro/image scrape/images/download.jpg")
if url != "":
urllib.urlretrieve(url, uri)
soup = BeautifulSoup(searched, "lxml")
soup1 = soup.content
images = soup1.find_all("a")
for image in images:
savepic(image)
我刚开始,所以我很感谢有关如何改进代码的任何提示。 谢谢
driver.get()
在浏览器中加载网页并返回None
,这使searched
到的变量具有None
值。
您可能打算改为获取.page_source
:
soup = BeautifulSoup(driver.page_source, "lxml")
这里还有两点:
BeautifulSoup
您可以使用driver.find_elements_by_tag_name()
找到selenium
的所需图像 selenium
等待页面加载 searched
为None
。 显然,您使用的网址无效。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.