簡體   English   中英

如何使用美麗的湯從網站下載圖像

[英]how to download image from a website using beautiful soup

我正在使用 python 和 Beautiful soup 下載並在本地保存嵌入/上傳在這篇文章中的圖像: https ://gall.dcinside.com/mgallery/board/view/?id=irudagall&no=18886&page=2027

但是,似乎無法下載圖像文件。 這是我到目前為止所擁有的:

from bs4 import BeautifulSoup
from requests import get
from PIL import Image

headers = {
    "Connection" : "keep-alive",
    "Cache-Control" : "max-age=0",
    "sec-ch-ua-mobile" : "?0",
    "DNT" : "1",
    "Upgrade-Insecure-Requests" : "1",
    "User-Agent" : "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.93 Safari/537.36",
    "Accept" : "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9",
    "Sec-Fetch-Site" : "none",
    "Sec-Fetch-Mode" : "navigate",
    "Sec-Fetch-User" : "?1",
    "Sec-Fetch-Dest" : "document",
    "Accept-Encoding" : "gzip, deflate, br",
    "Accept-Language" : "ko-KR,ko;q=0.9"
    }


test_url = 'https://gall.dcinside.com/mgallery/board/view/?id=irudagall&no=18887&page=2027'
test_res = requests.get(test_url, headers=headers)
test_soup = BeautifulSoup(test_res.text, "lxml")

img_url = test_soup.find("ul", {"class": "appending_file"}).find("a")['href']
img = Image.open(requests.get(image_url, stream = True).raw)
img.save('image.jpg')

好吧,這是一個棘手的問題。 如果您在獲取請求時在瀏覽器中檢查您的網絡調用(單擊F12 ),您將看到請求是如何發出的 - 請求了哪些參數/標頭。

如果您然后右鍵單擊網絡調用,您將看到一個復制為 Curl 的按鈕

在此處輸入圖像描述

然后,您可以將代碼復制到curlconverter.com ,它會為您提供 python 代碼。

因此,您可以使用正確的參數直接獲取圖像:

import requests


headers = {
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
    'Accept-Language': 'en-US,en;q=0.9',
    'Connection': 'keep-alive',
    # Requests sorts cookies= alphabetically
    # 'Cookie': 'PHPSESSID=3121e08010ef697de8f66ab58c5f0ec2; ci_c=128cd80d9f5fdfc51c7c61edb5645ec6; ck_lately_gall=9XC; csid=0029f829d7ed21adb420a27f300ddb9d1a0a7da0b6d5a363db86afe5b658ca4807eb780658d416; ck_l_f=l; __utma=118540316.2061578954.1656510647.1656510647.1656510647.1; __utmc=118540316; __utmz=118540316.1656510647.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); _ga=GA1.2.2061578954.1656510647; _gid=GA1.2.1765121273.1656510647; _cc_id=4e3678109cd1c9313694988a7a67c933; panoramaId_expiry=1657115450856; panoramaId=a32107449d1dfe505a1342c855274945a702958a38612ff1e8d52232bab45444; alarm_new=1; ck_img_view_cnt=3; __utmt=1; __utmb=118540316.6.10.1656510647; last_alarm=1656511621; gallRecom=MjAyMi0wNi0yOSAyMzowNzo0My9jMzAwODJiNjRjYjQwZGRkMzZkZGZmNDUzODM3YjM3ZDFiMTNkNjg4OTlhNGFiYTM2YTdlY2IyM2ZkZGNiNDhk; service_code=21ac6d96ad152e8f15a05b7350a2475909d19bcedeba9d4face8115e9bc1fa4c1fb428016691909c29fb3e7c9f6390a488eb8106cf73377811f8e19d62b91109c8222c6181a534254c3cc6086a49a064ad17a08132571b4449d6bc0e10ec08afe5a9c3f3890340535a1bb0913e81814db61c3e9ff51bc2807ddc90ae309289dccde248ba6d7959046766f07235a7d51dd243176d34eccef23feebeaa5aa212e3785600f5573e2caede3b46d6b75496ce0605fe2cc763951c5632903a1dd0f4fa5b05e1a057f9e20a31',
    'Referer': 'https://gall.dcinside.com/',
    'Sec-Fetch-Dest': 'document',
    'Sec-Fetch-Mode': 'navigate',
    'Sec-Fetch-Site': 'same-site',
    'Sec-Fetch-User': '?1',
    'Upgrade-Insecure-Requests': '1',
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.0.0 Safari/537.36',
    'sec-ch-ua': '" Not A;Brand";v="99", "Chromium";v="102", "Google Chrome";v="102"',
    'sec-ch-ua-mobile': '?0',
    'sec-ch-ua-platform': '"Windows"',
}

params = {
    'no': '24b0d769e1d32ca73cec85fa11d028318672c324147e3f6e5672e000c8e1092199c885c51d343f31d2f953fd08aa818e5ca9106941082978e5453bba01a55e89609815c87ddee88307576533477dc4',
    'f_no': '20210110_215524.jpg',
}

response = requests.get('https://image.dcinside.com/download.php', params=params, headers=headers)

with open('image.jpg', 'wb') as f:
    f.write(response.content)

首先,您需要找到圖片的編號才能將其傳輸以供下載。 我們為此使用 BeautifulSoup。 然后我們只需將找到的數字作為參數傳遞給下載

import requests
from bs4 import BeautifulSoup


url = 'https://gall.dcinside.com/mgallery/board/view/?id=irudagall&no=18887&page=2027'
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.93 Safari/537.36",
    "Accept": "*/*",
    "Referer": "https://gall.dcinside.com/"
    }
response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.text, 'lxml')
image_no = soup.find('div', class_='write_div').find('img').get('src').split('=')[1]
with open('image.jpg', 'wb') as img_file:
    response = requests.get('https://image.dcinside.com/download.php', {'no': image_no}, headers=headers)
    img_file.write(response.content)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM