简体   繁体   English

通过python从网站爬网的图像无法通过Photoshop打开

[英]Images crawled by python from website cannot be open by photoshop

The webpage is like: 该网页类似于:

<img data-s="300,640" data-type="jpeg" data-src="http://mmbiz.qpic.cn/mmbiz/2ibL1hUwSYSJO5BkyCQMicnPL5y1yAkcKh3YCITccD4IxWibI2wKpgYatDXgBBvOW01oOnGZGPVmfMDR0cQKSjeew/0?wx_fmt=jpeg" data-ratio="1.7613636363636365" data-w="440" width="auto" style="margin: 0px; padding: 0px; box-sizing: border-box !important; word-wrap: break-word !important; width: auto !important; visibility: visible !important; height: auto !important;" _width="auto" src="http://mmbiz.qpic.cn/mmbiz/2ibL1hUwSYSJO5BkyCQMicnPL5y1yAkcKh3YCITccD4IxWibI2wKpgYatDXgBBvOW01oOnGZGPVmfMDR0cQKSjeew/640?wx_fmt=jpeg&amp;tp=webp&amp;wxfrom=5&amp;wx_lazy=1">

so my script to download the image: 所以我的脚本下载图像:

    imgs =  soup.find_all('img')
    for img in imgs:
        if img != None and img['data-type']=="jpeg":
            count += 1
            link = img['data-src']
            piccode = urllib2.urlopen(link).read()
            picname = "pic"+str(count)+".jpg"
            with open(picname,'wb') as code:
                code.write(piccode)

But the jpg file that I download cannot be opened or edited by photoshop. 但是我下载的jpg文件无法用photoshop打开或编辑。 Somehow I fill that the iamges I download are borken.Why? 我以某种方式填补了我下载的图片很烦的原因,为什么?

First 第一

Make sure that the saved files are being set with correct permissions. 确保已使用正确的权限设置了保存的文件。

Then 然后

I can't answer the question of "Why can't photoshop open your pictures." 我无法回答“为什么Photoshop无法打开您的图片”的问题。 but I can offer an alternative which worked for me. 但我可以提供一种对我有用的选择。 It's below. 在下面

It looks like urllib has a built in function that downloads and saves a file in a single line. 似乎urllib具有一个内置功能,该功能可以在一行中下载和保存文件。 Not sure if urllib2 does but I couldn't find anything quickly. 不知道urllib2是否可以,但是我找不到任何东西。

Replace 更换

piccode = urllib2.urlopen(link).read()

With

urllib.urlretrieve(link, pic_filename)

You'll also be able to get rid of the code that comes after this line since urlretrieve does this all in one line. 您还可以摆脱此行之后的代码,因为urlretrieve在一行中完成了所有这些操作。

Let me know if this works and if you need anymore help! 让我知道这是否有效以及您是否需要更多帮助!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM