简体   繁体   English

如何使用 Beatifulsoup 从缩略图中获取完整的图像数据?

[英]How to get full image data from thumbnail using Beatifulsoup?

I'm trying to code a program on Python that downloads a random image based on a search query.我正在尝试在 Python 上编写一个程序,该程序根据搜索查询下载随机图像。 Here's my so far:这是我到目前为止:

import requests
from bs4 import BeautifulSoup
import random

query = 'pets' #This can be anything, this is just for demonstration 
adlt = 'on'
count = '10'

#I tried using Google but Bing is more cooperative
URL='https://bing.com/images/search?q=' + query + '&safeSearch=' + adlt + '&count=' + count

html_page = requests.get(URL)

soup = BeautifulSoup(html_page.content, 'html.parser')

images = soup.find_all('img')

example = random.choice(images)

imageLink = example.attrs['src']

print(imageLink)

So, what this code does is it goes to Bing's image engine and gets all the tags in there.因此,这段代码的作用是进入 Bing 的图像引擎并获取其中的所有标签。 Then it chooses a random one and prints it's URL on the terminal.然后它随机选择一个并在终端上打印它的 URL。 But as you might know, what's shown on Bing's and Google's image engine isn't the actual image but a smaller version of it, you need to click it to access the actual image.但是您可能知道,Bing 和 Google 的图像引擎上显示的不是实际图像,而是它的较小版本,您需要单击它才能访问实际图像。 So, from the data I get from this thumbnail how can I access the real image?那么,从我从这个缩略图中获得的数据中,我怎样才能访问真实的图像呢?

Here's the html code for a thumbnail in case you need it:这是缩略图的 html 代码,以备不时之需:

<img class="mimg" style="color: rgb(157, 102, 46);" height="180" width="323" src="https://th.bing.com/th/id/OIP.1lJSjlsM4xmvJQTDwkOcbgHaEH?w=323&h=180&c=7&o=5&dpr=1.25&pid=1.7" alt="Image result for pets" data-thhnrepbd="1" data-bm="180">

And here's the code for the full image of that thumbnail:这是该缩略图的完整图像的代码:

<img src="http://www.insuranceportals.us/wp-content/uploads/2018/07/Pets-Health-Insurance-Wise-Investment-Or-Waste-of-Money.jpeg" alt="See the source image" class=" nofocus" tabindex="0" aria-label="See the source image">

The page is loaded dynamically, so requests doesn't support it.该页面是动态加载的,因此requests不支持它。 We can use Selenium as an alternative to scrape the page.我们可以使用Selenium作为抓取页面的替代方案。

Install it with: pip install selenium .安装它: pip install selenium

Download the correct ChromeDriver from here .这里下载正确的 ChromeDriver。

import random
from time import sleep
from selenium import webdriver
from bs4 import BeautifulSoup


driver = webdriver.Chrome(r"c:\path\to\chromedriver.exe")
query = "pets"
adult = "on"
count = "10"

URL = (
    "https://bing.com/images/search?q="
    + query
    + "&safeSearch="
    + adult
    + "&count="
    + count
)
driver.get(URL)
# Wait for page to fully render
sleep(5)

soup = BeautifulSoup(driver.page_source, "html.parser")
all_images = soup.find_all("img")
image = random.choice(all_images)
print(image)

driver.quit()

Output: Output:

<img alt="Turtle" data-bm="78" data-priority="2" data-thhnrepbd="1" height="42" src2="https://th.bing.com/th?q=Pet+Turtle&amp;w=42&amp;h=42&amp;c=1&amp;p=0&amp;pid=InlineBlock&amp;mkt=en-US&amp;adlt=moderate&amp;t=1" width="42"/>

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM