[英]How to get full image data from thumbnail using Beatifulsoup?
I'm trying to code a program on Python that downloads a random image based on a search query.我正在尝试在 Python 上编写一个程序,该程序根据搜索查询下载随机图像。 Here's my so far:
这是我到目前为止:
import requests
from bs4 import BeautifulSoup
import random
query = 'pets' #This can be anything, this is just for demonstration
adlt = 'on'
count = '10'
#I tried using Google but Bing is more cooperative
URL='https://bing.com/images/search?q=' + query + '&safeSearch=' + adlt + '&count=' + count
html_page = requests.get(URL)
soup = BeautifulSoup(html_page.content, 'html.parser')
images = soup.find_all('img')
example = random.choice(images)
imageLink = example.attrs['src']
print(imageLink)
So, what this code does is it goes to Bing's image engine and gets all the tags in there.因此,这段代码的作用是进入 Bing 的图像引擎并获取其中的所有标签。 Then it chooses a random one and prints it's URL on the terminal.
然后它随机选择一个并在终端上打印它的 URL。 But as you might know, what's shown on Bing's and Google's image engine isn't the actual image but a smaller version of it, you need to click it to access the actual image.
但是您可能知道,Bing 和 Google 的图像引擎上显示的不是实际图像,而是它的较小版本,您需要单击它才能访问实际图像。 So, from the data I get from this thumbnail how can I access the real image?
那么,从我从这个缩略图中获得的数据中,我怎样才能访问真实的图像呢?
Here's the html code for a thumbnail in case you need it:这是缩略图的 html 代码,以备不时之需:
<img class="mimg" style="color: rgb(157, 102, 46);" height="180" width="323" src="https://th.bing.com/th/id/OIP.1lJSjlsM4xmvJQTDwkOcbgHaEH?w=323&h=180&c=7&o=5&dpr=1.25&pid=1.7" alt="Image result for pets" data-thhnrepbd="1" data-bm="180">
And here's the code for the full image of that thumbnail:这是该缩略图的完整图像的代码:
<img src="http://www.insuranceportals.us/wp-content/uploads/2018/07/Pets-Health-Insurance-Wise-Investment-Or-Waste-of-Money.jpeg" alt="See the source image" class=" nofocus" tabindex="0" aria-label="See the source image">
The page is loaded dynamically, so requests
doesn't support it.该页面是动态加载的,因此
requests
不支持它。 We can use Selenium as an alternative to scrape the page.我们可以使用Selenium作为抓取页面的替代方案。
Install it with: pip install selenium
.安装它:
pip install selenium
。
Download the correct ChromeDriver from here .从 这里下载正确的 ChromeDriver。
import random
from time import sleep
from selenium import webdriver
from bs4 import BeautifulSoup
driver = webdriver.Chrome(r"c:\path\to\chromedriver.exe")
query = "pets"
adult = "on"
count = "10"
URL = (
"https://bing.com/images/search?q="
+ query
+ "&safeSearch="
+ adult
+ "&count="
+ count
)
driver.get(URL)
# Wait for page to fully render
sleep(5)
soup = BeautifulSoup(driver.page_source, "html.parser")
all_images = soup.find_all("img")
image = random.choice(all_images)
print(image)
driver.quit()
Output: Output:
<img alt="Turtle" data-bm="78" data-priority="2" data-thhnrepbd="1" height="42" src2="https://th.bing.com/th?q=Pet+Turtle&w=42&h=42&c=1&p=0&pid=InlineBlock&mkt=en-US&adlt=moderate&t=1" width="42"/>
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.