简体   繁体   中英

Can't Download Full File in Python

I was using Bs4 in Python for downloading a wallpaper from nmgncp.com. However the code downloads only 16KB file whereas the full image is around 300KB. Please help me. I have even tried wget.download method.

PS:- I am using Python 3.6 on Windows 10.

Here is my code::--

from bs4 import BeautifulSoup
import requests
import datetime
import time
import re
import wget
import os


url='http://www.nmgncp.com/dark-wallpaper-1920x1080.html'

html=requests.get(url)
soup=BeautifulSoup(html.text,"lxml")
a = soup.findAll('img')[0].get('src')
newurl='http://www.nmgncp.com/'+a
print(newurl)

response = requests.get(newurl)
if response.status_code == 200:
    with open("C:/Users/KD/Desktop/Python_practice/newwww.jpg", 'wb') as f:
        f.write(response.content)

The source of your problem is because there is a protection : the image page requires a referer, otherwise it redirects to the html page.

Source code fixed :

from bs4 import BeautifulSoup
import requests
import datetime
import time
import re
import wget
import os


url='http://www.nmgncp.com/dark-wallpaper-1920x1080.html'

html=requests.get(url)
soup=BeautifulSoup(html.text,"lxml")
a = soup.findAll('img')[0].get('src')
newurl='http://www.nmgncp.com'+a
print(newurl)

response = requests.get(newurl, headers={'referer': newurl})
if response.status_code == 200:
    with open("C:/Users/KD/Desktop/Python_practice/newwww.jpg", 'wb') as f:
        f.write(response.content)

First of all http://www.nmgncp.com/dark-wallpaper-1920x1080.html is an HTML document. Second when you try to download an image by direct URL (like: http://www.nmgncp.com/data/out/95/4351795-dark-wallpaper-1920x1080.jpg ) it will also redirect you to a HTML document. This is most probably because the hoster (nmgncp.com) does not want to provide direct links to its images. He can check whether the image was called directly by looking at the HTTP referer and deciding if it is valid. So in this case you have to put in some more effort to make the hoster think, that you are a valid caller of direct URLs.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM