简体   繁体   中英

Python Beautifulsoup: How to download images from a div and then copy it to word document?

Here is my code:

    for div in panel:
    titleList = div.find('div', attrs={'class': 'panel-heading'})
    imageList = div.find('div', attrs={'class', 'pro-image'})
    descList = div.find('div', attrs={'class': 'pro-desc'})
    print titleList.get_text(separator=u' ')
    print descList.get_text(separator=u' ')
    document.add_heading("%s \t \n" % titleList.get_text(separator=u'  '), level=1)
    document.add_paragraph("%s \t \n" % descList.get_text(separator=u'  '))

I want to download the images from:

imageList = div.find('div', attrs={'class', 'pro-image'})

I then want to copy those downloaded images and copy them into a word document. How do I do this?

You can use requests to download image and then just save it (as a binary data) with proper extension.

Suppose your image is located at http://example/my_image.jpg

with open("my_image.jpg", "wb") as img_handle:
    img_data = requests.get("http://example/my_image.jpg")
    img_handle.write(img_data.content)

This is just a simple example though. As noted by tmadam in the comments, you should use img_data.content instead of img_data.text for binary data.

As for inserting that image into Word document, you can use any library which provides such functionality. python-docx comes up as a first google search result, it may be useful.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM