简体   繁体   English

如何使用美丽的汤从网站下载图像

[英]how to download image from a website using beautiful soup

I am using python and Beautiful soup to download & locally save the image embedded/uploaded in this post: https://gall.dcinside.com/mgallery/board/view/?id=irudagall&no=18886&page=2027我正在使用 python 和 Beautiful soup 下载并在本地保存嵌入/上传在这篇文章中的图像: https ://gall.dcinside.com/mgallery/board/view/?id=irudagall&no=18886&page=2027

However, it seems the image file is not downloadable.但是,似乎无法下载图像文件。 This is what I have so far:这是我到目前为止所拥有的:

from bs4 import BeautifulSoup
from requests import get
from PIL import Image

headers = {
    "Connection" : "keep-alive",
    "Cache-Control" : "max-age=0",
    "sec-ch-ua-mobile" : "?0",
    "DNT" : "1",
    "Upgrade-Insecure-Requests" : "1",
    "User-Agent" : "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.93 Safari/537.36",
    "Accept" : "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9",
    "Sec-Fetch-Site" : "none",
    "Sec-Fetch-Mode" : "navigate",
    "Sec-Fetch-User" : "?1",
    "Sec-Fetch-Dest" : "document",
    "Accept-Encoding" : "gzip, deflate, br",
    "Accept-Language" : "ko-KR,ko;q=0.9"
    }


test_url = 'https://gall.dcinside.com/mgallery/board/view/?id=irudagall&no=18887&page=2027'
test_res = requests.get(test_url, headers=headers)
test_soup = BeautifulSoup(test_res.text, "lxml")

img_url = test_soup.find("ul", {"class": "appending_file"}).find("a")['href']
img = Image.open(requests.get(image_url, stream = True).raw)
img.save('image.jpg')

Okay, this is a tricky one.好吧,这是一个棘手的问题。 If you inspect your Network calls in your browser (Click on F12 ) while the request is being fetched, you'll see how the request was made - what parameter/headers were requested.如果您在获取请求时在浏览器中检查您的网络调用(单击F12 ),您将看到请求是如何发出的 - 请求了哪些参数/标头。

If you then right-click on the Network call, you'll see a button to copy as Curl如果您然后右键单击网络调用,您将看到一个复制为 Curl 的按钮

在此处输入图像描述

You can then copy the code to curlconverter.com and it will give you the python code.然后,您可以将代码复制到curlconverter.com ,它会为您提供 python 代码。

So, you can directly get the image, using the correct parameters:因此,您可以使用正确的参数直接获取图像:

import requests


headers = {
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
    'Accept-Language': 'en-US,en;q=0.9',
    'Connection': 'keep-alive',
    # Requests sorts cookies= alphabetically
    # 'Cookie': 'PHPSESSID=3121e08010ef697de8f66ab58c5f0ec2; ci_c=128cd80d9f5fdfc51c7c61edb5645ec6; ck_lately_gall=9XC; csid=0029f829d7ed21adb420a27f300ddb9d1a0a7da0b6d5a363db86afe5b658ca4807eb780658d416; ck_l_f=l; __utma=118540316.2061578954.1656510647.1656510647.1656510647.1; __utmc=118540316; __utmz=118540316.1656510647.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); _ga=GA1.2.2061578954.1656510647; _gid=GA1.2.1765121273.1656510647; _cc_id=4e3678109cd1c9313694988a7a67c933; panoramaId_expiry=1657115450856; panoramaId=a32107449d1dfe505a1342c855274945a702958a38612ff1e8d52232bab45444; alarm_new=1; ck_img_view_cnt=3; __utmt=1; __utmb=118540316.6.10.1656510647; last_alarm=1656511621; gallRecom=MjAyMi0wNi0yOSAyMzowNzo0My9jMzAwODJiNjRjYjQwZGRkMzZkZGZmNDUzODM3YjM3ZDFiMTNkNjg4OTlhNGFiYTM2YTdlY2IyM2ZkZGNiNDhk; service_code=21ac6d96ad152e8f15a05b7350a2475909d19bcedeba9d4face8115e9bc1fa4c1fb428016691909c29fb3e7c9f6390a488eb8106cf73377811f8e19d62b91109c8222c6181a534254c3cc6086a49a064ad17a08132571b4449d6bc0e10ec08afe5a9c3f3890340535a1bb0913e81814db61c3e9ff51bc2807ddc90ae309289dccde248ba6d7959046766f07235a7d51dd243176d34eccef23feebeaa5aa212e3785600f5573e2caede3b46d6b75496ce0605fe2cc763951c5632903a1dd0f4fa5b05e1a057f9e20a31',
    'Referer': 'https://gall.dcinside.com/',
    'Sec-Fetch-Dest': 'document',
    'Sec-Fetch-Mode': 'navigate',
    'Sec-Fetch-Site': 'same-site',
    'Sec-Fetch-User': '?1',
    'Upgrade-Insecure-Requests': '1',
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.0.0 Safari/537.36',
    'sec-ch-ua': '" Not A;Brand";v="99", "Chromium";v="102", "Google Chrome";v="102"',
    'sec-ch-ua-mobile': '?0',
    'sec-ch-ua-platform': '"Windows"',
}

params = {
    'no': '24b0d769e1d32ca73cec85fa11d028318672c324147e3f6e5672e000c8e1092199c885c51d343f31d2f953fd08aa818e5ca9106941082978e5453bba01a55e89609815c87ddee88307576533477dc4',
    'f_no': '20210110_215524.jpg',
}

response = requests.get('https://image.dcinside.com/download.php', params=params, headers=headers)

with open('image.jpg', 'wb') as f:
    f.write(response.content)

First you need to find the number of the picture in order to transfer it for download.首先,您需要找到图片的编号才能将其传输以供下载。 We use BeautifulSoup for this.我们为此使用 BeautifulSoup。 Then we simply pass the found number to the download as a parameter然后我们只需将找到的数字作为参数传递给下载

import requests
from bs4 import BeautifulSoup


url = 'https://gall.dcinside.com/mgallery/board/view/?id=irudagall&no=18887&page=2027'
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.93 Safari/537.36",
    "Accept": "*/*",
    "Referer": "https://gall.dcinside.com/"
    }
response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.text, 'lxml')
image_no = soup.find('div', class_='write_div').find('img').get('src').split('=')[1]
with open('image.jpg', 'wb') as img_file:
    response = requests.get('https://image.dcinside.com/download.php', {'no': image_no}, headers=headers)
    img_file.write(response.content)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM