简体   繁体   English

如何使用 Python 截取网站的屏幕截图/图片?

[英]How can I take a screenshot/image of a website using Python?

What I want to achieve is to get a website screenshot from any website in python.我想实现的是从python中的任意网站获取网站截图。

Env: Linux环境:Linux

Here is a simple solution using webkit: http://webscraping.com/blog/Webpage-screenshots-with-webkit/这是一个使用 webkit 的简单解决方案: http : //webscraping.com/blog/Webpage-screenshots-with-webkit/

import sys
import time
from PyQt4.QtCore import *
from PyQt4.QtGui import *
from PyQt4.QtWebKit import *

class Screenshot(QWebView):
    def __init__(self):
        self.app = QApplication(sys.argv)
        QWebView.__init__(self)
        self._loaded = False
        self.loadFinished.connect(self._loadFinished)

    def capture(self, url, output_file):
        self.load(QUrl(url))
        self.wait_load()
        # set to webpage size
        frame = self.page().mainFrame()
        self.page().setViewportSize(frame.contentsSize())
        # render image
        image = QImage(self.page().viewportSize(), QImage.Format_ARGB32)
        painter = QPainter(image)
        frame.render(painter)
        painter.end()
        print 'saving', output_file
        image.save(output_file)

    def wait_load(self, delay=0):
        # process app events until page loaded
        while not self._loaded:
            self.app.processEvents()
            time.sleep(delay)
        self._loaded = False

    def _loadFinished(self, result):
        self._loaded = True

s = Screenshot()
s.capture('http://webscraping.com', 'website.png')
s.capture('http://webscraping.com/blog', 'blog.png')

Here is my solution by grabbing help from various sources.这是我通过从各种来源获取帮助的解决方案。 It takes full web page screen capture and it crops it (optional) and generates thumbnail from the cropped image also.它需要完整的网页屏幕截图并对其进行裁剪(可选)并从裁剪后的图像中生成缩略图。 Following are the requirements:以下是要求:

Requirements:要求:

  1. Install NodeJS安装 Node.js
  2. Using Node's package manager install phantomjs: npm -g install phantomjs使用 Node 的包管理器安装 phantomjs: npm -g install phantomjs
  3. Install selenium (in your virtualenv, if you are using that)安装 selenium(在你的 virtualenv 中,如果你正在使用它)
  4. Install imageMagick安装 imageMagick
  5. Add phantomjs to system path (on windows)将 phantomjs 添加到系统路径(在 Windows 上)

import os
from subprocess import Popen, PIPE
from selenium import webdriver

abspath = lambda *p: os.path.abspath(os.path.join(*p))
ROOT = abspath(os.path.dirname(__file__))


def execute_command(command):
    result = Popen(command, shell=True, stdout=PIPE).stdout.read()
    if len(result) > 0 and not result.isspace():
        raise Exception(result)


def do_screen_capturing(url, screen_path, width, height):
    print "Capturing screen.."
    driver = webdriver.PhantomJS()
    # it save service log file in same directory
    # if you want to have log file stored else where
    # initialize the webdriver.PhantomJS() as
    # driver = webdriver.PhantomJS(service_log_path='/var/log/phantomjs/ghostdriver.log')
    driver.set_script_timeout(30)
    if width and height:
        driver.set_window_size(width, height)
    driver.get(url)
    driver.save_screenshot(screen_path)


def do_crop(params):
    print "Croping captured image.."
    command = [
        'convert',
        params['screen_path'],
        '-crop', '%sx%s+0+0' % (params['width'], params['height']),
        params['crop_path']
    ]
    execute_command(' '.join(command))


def do_thumbnail(params):
    print "Generating thumbnail from croped captured image.."
    command = [
        'convert',
        params['crop_path'],
        '-filter', 'Lanczos',
        '-thumbnail', '%sx%s' % (params['width'], params['height']),
        params['thumbnail_path']
    ]
    execute_command(' '.join(command))


def get_screen_shot(**kwargs):
    url = kwargs['url']
    width = int(kwargs.get('width', 1024)) # screen width to capture
    height = int(kwargs.get('height', 768)) # screen height to capture
    filename = kwargs.get('filename', 'screen.png') # file name e.g. screen.png
    path = kwargs.get('path', ROOT) # directory path to store screen

    crop = kwargs.get('crop', False) # crop the captured screen
    crop_width = int(kwargs.get('crop_width', width)) # the width of crop screen
    crop_height = int(kwargs.get('crop_height', height)) # the height of crop screen
    crop_replace = kwargs.get('crop_replace', False) # does crop image replace original screen capture?

    thumbnail = kwargs.get('thumbnail', False) # generate thumbnail from screen, requires crop=True
    thumbnail_width = int(kwargs.get('thumbnail_width', width)) # the width of thumbnail
    thumbnail_height = int(kwargs.get('thumbnail_height', height)) # the height of thumbnail
    thumbnail_replace = kwargs.get('thumbnail_replace', False) # does thumbnail image replace crop image?

    screen_path = abspath(path, filename)
    crop_path = thumbnail_path = screen_path

    if thumbnail and not crop:
        raise Exception, 'Thumnail generation requires crop image, set crop=True'

    do_screen_capturing(url, screen_path, width, height)

    if crop:
        if not crop_replace:
            crop_path = abspath(path, 'crop_'+filename)
        params = {
            'width': crop_width, 'height': crop_height,
            'crop_path': crop_path, 'screen_path': screen_path}
        do_crop(params)

        if thumbnail:
            if not thumbnail_replace:
                thumbnail_path = abspath(path, 'thumbnail_'+filename)
            params = {
                'width': thumbnail_width, 'height': thumbnail_height,
                'thumbnail_path': thumbnail_path, 'crop_path': crop_path}
            do_thumbnail(params)
    return screen_path, crop_path, thumbnail_path


if __name__ == '__main__':
    '''
        Requirements:
        Install NodeJS
        Using Node's package manager install phantomjs: npm -g install phantomjs
        install selenium (in your virtualenv, if you are using that)
        install imageMagick
        add phantomjs to system path (on windows)
    '''

    url = 'http://stackoverflow.com/questions/1197172/how-can-i-take-a-screenshot-image-of-a-website-using-python'
    screen_path, crop_path, thumbnail_path = get_screen_shot(
        url=url, filename='sof.png',
        crop=True, crop_replace=False,
        thumbnail=True, thumbnail_replace=False,
        thumbnail_width=200, thumbnail_height=150,
    )

These are the generated images:这些是生成的图像:

can do using Selenium可以使用硒

from selenium import webdriver

DRIVER = 'chromedriver'
driver = webdriver.Chrome(DRIVER)
driver.get('https://www.spotify.com')
screenshot = driver.save_screenshot('my_screenshot.png')
driver.quit()

https://sites.google.com/a/chromium.org/chromedriver/getting-started https://sites.google.com/a/chromium.org/chromedriver/getting-started

On the Mac, there's webkit2png and on Linux+KDE, you can use khtml2png .在 Mac 上,有webkit2png ,在 Linux+KDE 上,你可以使用khtml2png I've tried the former and it works quite well, and heard of the latter being put to use.我试过前者,效果很好,听说后者正在使用。

I recently came across QtWebKit which claims to be cross platform (Qt rolled WebKit into their library, I guess).我最近遇到了声称是跨平台的QtWebKit (我猜 Qt 将 WebKit 引入了他们的库)。 But I've never tried it, so I can't tell you much more.但是我从来没有尝试过,所以我不能告诉你更多。

The QtWebKit links shows how to access from Python. QtWebKit 链接显示了如何从 Python 访问。 You should be able to at least use subprocess to do the same with the others.您至少应该能够使用 subprocess 对其他人做同样的事情。

Using Rendertron is an option.使用Rendertron是一种选择。 Under the hood, this is a headless Chrome exposing the following endpoints:在幕后,这是一个无头的 Chrome,暴露了以下端点:

  • /render/:url : Access this route eg with requests.get if you are interested in the DOM. /render/:url :如果您对 DOM 感兴趣,可以使用requests.get访问此路由。
  • /screenshot/:url : Access this route if you are interested in a screenshot. /screenshot/:url :如果您对屏幕截图感兴趣,请访问此路线。

You would install rendertron with npm, run rendertron in one terminal, access http://localhost:3000/screenshot/:url and save the file, but a demo is available at render-tron.appspot.com making it possible to run this Python3 snippet locally without installing the npm package:您可以使用 npm 安装 rendertron,在一个终端中运行rendertron ,访问http://localhost:3000/screenshot/:url并保存文件,但是render-tron.appspot.com上提供了一个演示,可以运行它本地 Python3 片段,无需安装 npm 包:

import requests

BASE = 'https://render-tron.appspot.com/screenshot/'
url = 'https://google.com'
path = 'target.jpg'
response = requests.get(BASE + url, stream=True)
# save file, see https://stackoverflow.com/a/13137873/7665691
if response.status_code == 200:
    with open(path, 'wb') as file:
        for chunk in response:
            file.write(chunk)

I can't comment on ars's answer, but I actually got Roland Tapken's code running using QtWebkit and it works quite well.我无法对 ars 的回答发表评论,但实际上我使用 QtWebkit 运行了Roland Tapken 的代码,并且运行良好。

Just wanted to confirm that what Roland posts on his blog works great on Ubuntu.只是想确认 Roland 在他的博客上发布的内容在 Ubuntu 上运行良好。 Our production version ended up not using any of what he wrote but we are using the PyQt/QtWebKit bindings with much success.我们的生产版本最终没有使用他编写的任何内容,但我们使用 PyQt/QtWebKit 绑定取得了很大成功。

Note : The URL used to be: http://www.blogs.uni-osnabrueck.de/rotapken/2008/12/03/create-screenshots-of-a-web-page-using-python-and-qtwebkit/ I've updated it with a working copy.注意:以前的 URL 是: http : //www.blogs.uni-osnabrueck.de/rotapken/2008/12/03/create-screenshots-of-a-web-page-using-python-and-qtwebkit/我已经用工作副本更新了它。

11 years later... 11年后...
Taking a website screenshot using Python3.6 and Google PageSpeedApi Insights v5 :使用Python3.6Google PageSpeedApi Insights v5 Python3.6网站截图:

import base64
import requests
import traceback
import urllib.parse as ul

# It's possible to make requests without the api key, but the number of requests is very limited  

url = "https://duckgo.com"
urle = ul.quote_plus(url)
image_path = "duckgo.jpg"

key = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
strategy = "desktop" # "mobile"
u = f"https://www.googleapis.com/pagespeedonline/v5/runPagespeed?key={key}&strategy={strategy}&url={urle}"

try:
    j = requests.get(u).json()
    ss_encoded = j['lighthouseResult']['audits']['final-screenshot']['details']['data'].replace("data:image/jpeg;base64,", "")
    ss_decoded = base64.b64decode(ss_encoded)
    with open(image_path, 'wb+') as f:
        f.write(ss_decoded) 
except :
    print(traceback.format_exc())
    exit(1)

Notes:笔记:

This is an old question and most answers are a bit dated.这是一个古老的问题,大多数答案都有些过时。 Currently, I would do 1 of 2 things.目前,我会做两件事中的一件。

1. Create a program that takes the screenshots 1. 创建一个截屏程序

I would use Pyppeteer to take screenshots of websites.我会使用Pyppeteer来截取网站的截图。 This runs on the Puppeteer package.这在Puppeteer包上运行。 Puppeteer spins up a headless chrome browser, so the screenshots will look exactly like they would in a normal browser. Puppeteer 启动无头 chrome 浏览器,因此屏幕截图看起来与普通浏览器中的完全一样。

This is taken from the pyppeteer documentation:这取自 pyppeteer 文档:

import asyncio
from pyppeteer import launch

async def main():
    browser = await launch()
    page = await browser.newPage()
    await page.goto('https://example.com')
    await page.screenshot({'path': 'example.png'})
    await browser.close()

asyncio.get_event_loop().run_until_complete(main())

2. Use a screenshot API 2.使用截图API

You could also use a screenshot API such as this one .您还可以使用屏幕截图 API,例如this one The nice thing is that you don't have to set everything up yourself but can simply call an API endpoint.好处是您不必自己设置所有内容,而只需调用 API 端点即可。

This is taken from the screenshot API's documentation:这是从截图 API 的文档中获取的:

import urllib.parse
import urllib.request
import ssl

ssl._create_default_https_context = ssl._create_unverified_context

# The parameters.
token = "YOUR_API_TOKEN"
url = urllib.parse.quote_plus("https://example.com")
width = 1920
height = 1080
output = "image"

# Create the query URL.
query = "https://screenshotapi.net/api/v1/screenshot"
query += "?token=%s&url=%s&width=%d&height=%d&output=%s" % (token, url, width, height, output)

# Call the API.
urllib.request.urlretrieve(query, "./example.png")

You can use Google Page Speed API to achieve your task easily.您可以使用 Google Page Speed API 轻松完成您的任务。 In my current project, I have used Google Page Speed API`s query written in Python to capture screenshots of any Web URL provided and save it to a location.在我当前的项目中,我使用了用 Python 编写的 Google Page Speed API 的查询来捕获提供的任何 Web URL 的屏幕截图并将其保存到某个位置。 Have a look.看一看。

import urllib2
import json
import base64
import sys
import requests
import os
import errno

#   The website's URL as an Input
site = sys.argv[1]
imagePath = sys.argv[2]

#   The Google API.  Remove "&strategy=mobile" for a desktop screenshot
api = "https://www.googleapis.com/pagespeedonline/v1/runPagespeed?screenshot=true&strategy=mobile&url=" + urllib2.quote(site)

#   Get the results from Google
try:
    site_data = json.load(urllib2.urlopen(api))
except urllib2.URLError:
    print "Unable to retreive data"
    sys.exit()

try:
    screenshot_encoded =  site_data['screenshot']['data']
except ValueError:
    print "Invalid JSON encountered."
    sys.exit()

#   Google has a weird way of encoding the Base64 data
screenshot_encoded = screenshot_encoded.replace("_", "/")
screenshot_encoded = screenshot_encoded.replace("-", "+")

#   Decode the Base64 data
screenshot_decoded = base64.b64decode(screenshot_encoded)

if not os.path.exists(os.path.dirname(impagepath)):
    try:
        os.makedirs(os.path.dirname(impagepath))
        except  OSError as exc:
            if exc.errno  != errno.EEXIST:
                raise

#   Save the file
with open(imagePath, 'w') as file_:
    file_.write(screenshot_decoded)

Unfortunately, following are the drawbacks.不幸的是,以下是缺点。 If these do not matter, you can proceed with Google Page Speed API.如果这些都不重要,您可以继续使用 Google Page Speed API。 It works well.它运作良好。

  • The maximum width is 320px最大宽度为 320px
  • According to Google API Quota, there is a limit of 25,000 requests per day根据 Google API Quota,每天有 25,000 个请求的限制

Using a web service s-shot.ru (so it's not so fast), but quite easy to set up what need through the link configuration.使用网络服务 s-shot.ru(所以它不是那么快),但是通过链接配置很容易设置需要的东西。 And you can easily capture full page screenshots您可以轻松捕获整页屏幕截图

import requests
import urllib.parse

BASE = 'https://mini.s-shot.ru/1024x0/JPEG/1024/Z100/?' # you can modify size, format, zoom
url = 'https://stackoverflow.com/'#or whatever link you need
url = urllib.parse.quote_plus(url) #service needs link to be joined in encoded format
print(url)

path = 'target1.jpg'
response = requests.get(BASE + url, stream=True)

if response.status_code == 200:
    with open(path, 'wb') as file:
        for chunk in response:
            file.write(chunk)

You don't mention what environment you're running in, which makes a big difference because there isn't a pure Python web browser that's capable of rendering HTML.您没有提及您在什么环境中运行,这会产生很大的不同,因为没有能够呈现 HTML 的纯 Python Web 浏览器。

But if you're using a Mac, I've used webkit2png with great success.但是如果你使用的是 Mac,我使用webkit2png并取得了巨大的成功。 If not, as others have pointed out there are plenty of options.如果没有,正如其他人指出的那样,有很多选择。

I created a library called pywebcapture that wraps selenium that will do just that:我创建了一个名为 pywebcapture 的库,它包装了 selenium 来做到这一点:

pip install pywebcapture

Once you install with pip, you can do the following to easily get full size screenshots:使用 pip 安装后,您可以执行以下操作以轻松获取完整尺寸的屏幕截图:

# import modules
from pywebcapture import loader, driver

# load csv with urls
csv_file = loader.CSVLoader("csv_file_with_urls.csv", has_header_bool, url_column, optional_filename_column)
uri_dict = csv_file.get_uri_dict()

# create instance of the driver and run
d = driver.Driver("path/to/webdriver/", output_filepath, delay, uri_dict)
d.run()

Enjoy!享受!

https://pypi.org/project/pywebcapture/ https://pypi.org/project/pywebcapture/

import subprocess

def screenshots(url, name):
    subprocess.run('webkit2png -F -o {} {} -D ./screens'.format(name, url), 
      shell=True)

Try this..试试这个..

#!/usr/bin/env python

import gtk.gdk

import time

import random

while 1 :
    # generate a random time between 120 and 300 sec
    random_time = random.randrange(120,300)

    # wait between 120 and 300 seconds (or between 2 and 5 minutes)
    print "Next picture in: %.2f minutes" % (float(random_time) / 60)

    time.sleep(random_time)

    w = gtk.gdk.get_default_root_window()
    sz = w.get_size()

    print "The size of the window is %d x %d" % sz

    pb = gtk.gdk.Pixbuf(gtk.gdk.COLORSPACE_RGB,False,8,sz[0],sz[1])
    pb = pb.get_from_drawable(w,w.get_colormap(),0,0,0,0,sz[0],sz[1])

    ts = time.time()
    filename = "screenshot"
    filename += str(ts)
    filename += ".png"

    if (pb != None):
        pb.save(filename,"png")
        print "Screenshot saved to "+filename
    else:
        print "Unable to get the screenshot."

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM