簡體   English   中英

如何使用python將網頁保存為圖像

[英]How to save web page as image using python

我正在使用 python 創建網站的“收藏夾”部分。 我想做的一部分是抓取一張圖片放在他們的鏈接旁邊。 所以這個過程是用戶輸入一個 URL,然后我去抓取那個頁面的屏幕截圖並將其顯示在鏈接旁邊。 夠容易嗎?

我目前已經下載了 pywebshot ,它在我的本地機器上的終端上運行良好。 但是,當我把它放在服務器上時,我得到一個帶有以下回溯的分段錯誤:

/usr/lib/pymodules/python2.6/gtk-2.0/gtk/__init__.py:57: GtkWarning: could not open display
  warnings.warn(str(e), _gtk.Warning)
./pywebshot.py:16: Warning: invalid (NULL) pointer instance
  self.parent = gtk.Window(gtk.WINDOW_TOPLEVEL)
./pywebshot.py:16: Warning: g_signal_connect_data: assertion `G_TYPE_CHECK_INSTANCE (instance)' failed
  self.parent = gtk.Window(gtk.WINDOW_TOPLEVEL)
./pywebshot.py:49: GtkWarning: Screen for GtkWindow not set; you must always set
a screen for a GtkWindow before using the window
  self.parent.show_all()
./pywebshot.py:49: GtkWarning: gdk_screen_get_default_colormap: assertion `GDK_IS_SCREEN (screen)' failed
  self.parent.show_all()
./pywebshot.py:49: GtkWarning: gdk_colormap_get_visual: assertion `GDK_IS_COLORMAP (colormap)' failed
  self.parent.show_all()
./pywebshot.py:49: GtkWarning: gdk_screen_get_root_window: assertion `GDK_IS_SCREEN (screen)' failed
  self.parent.show_all()
./pywebshot.py:49: GtkWarning: gdk_window_new: assertion `GDK_IS_WINDOW (parent)' failed
  self.parent.show_all()
Segmentation fault

我知道有些東西不能在 pts 環境中運行,但老實說,現在這有點超出我的能力。 如果我需要以某種方式假裝我的 pts 連接是 tty,我可以嘗試一下。 但在這一點上,我什至不確定發生了什么,我承認這有點超出我的想象。 任何幫助將不勝感激。

此外,如果有一個 Web 服務,我可以傳遞一個 url 並接收一個圖像,那也可以。 我不接受 pywebshot 的想法。

我知道我所在的服務器正在運行 X,並且安裝了所有必要的 python 模塊。

提前致謝。

我發現了websnapr.com ,它是一種網絡服務,只需一點點工作即可為您提供圖像。

import subprocess
subprocess.Popen(['wget', '-O', MYFILENAME+'.png', 'http://images.websnapr.com/?url='+MYURL+'&size=s&nocache=82']).wait()

非常簡單。

from selenium import webdriver    
from xvfbwrapper import Xvfb
d=Xvfb(width=400,height=400)
d.start()
browser=webdriver.Firefox()
url="http://stackoverflow.com/questions/4091940/how-to-save-web-page-as-image-using-python"
browser.get(url)
destination="screenshot_filename.jpg"
if browser.save_screenshot(destination):
    print "File saved in the destination filename"
browser.quit()

讓我猜猜,服務器沒有 X 服務器,對吧?

您可能必須運行無頭 X 服務器才能使其正常工作。

這是我用來獲取整個滾動網頁截圖的代碼:

from PIL import Image
from io import BytesIO
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
import logging
import os
import time

# Set default download folder for ChromeDriver
videos_folder = r"./download"
if not os.path.exists(videos_folder):
    os.makedirs(videos_folder)
prefs = {"download.default_directory": videos_folder}

def open_url(address):
    # SELENIUM SETUP
    logging.getLogger('WDM').setLevel(logging.WARNING)  # just to hide not so rilevant webdriver-manager messages
    chrome_options = Options()
    chrome_options.headless = True
    chrome_options.add_experimental_option("prefs", prefs)
    driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=chrome_options)
    driver.implicitly_wait(1)
    driver.maximize_window()
    driver.get(address)
    driver.set_window_size(1920, 1080)  # to set the screenshot width
    save_screenshot(driver, '{}/Screenshot.png'.format(videos_folder))
    driver.quit()

def save_screenshot(driver, file_name):
    height, width = scroll_down(driver)
    driver.set_window_size(width, height)
    img_binary = driver.get_screenshot_as_png()
    img = Image.open(BytesIO(img_binary))
    img.save(file_name)
    # print(file_name)
    print("Screenshot saved!")

def scroll_down(driver):
    total_width = driver.execute_script("return document.body.offsetWidth")
    total_height = driver.execute_script("return document.body.parentNode.scrollHeight")
    viewport_width = driver.execute_script("return document.body.clientWidth")
    viewport_height = driver.execute_script("return window.innerHeight")

    rectangles = []

    i = 0
    while i < total_height:
        ii = 0
        top_height = i + viewport_height

        if top_height > total_height:
            top_height = total_height

        while ii < total_width:
            top_width = ii + viewport_width

            if top_width > total_width:
                top_width = total_width

            rectangles.append((ii, i, top_width, top_height))

            ii = ii + viewport_width

        i = i + viewport_height

    previous = None
    part = 0

    for rectangle in rectangles:
        if not previous is None:
            driver.execute_script("window.scrollTo({0}, {1})".format(rectangle[0], rectangle[1]))
            time.sleep(0.5)
        # time.sleep(0.2)

        if rectangle[1] + viewport_height > total_height:
            offset = (rectangle[0], total_height - viewport_height)
        else:
            offset = (rectangle[0], rectangle[1])

        previous = rectangle

    return total_height, total_width

open_url("https://stackoverflow.com/questions/4091940/how-to-save-web-page-as-image-using-python")

這里得到的截圖:

整個網頁截圖

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM