Python和Tkinter中的GUI Web抓取工具

Question

編輯

好的，到目前為止，這是我的全部代碼：

from selenium import webdriver
from bs4 import BeautifulSoup as bs
import requests
import time
import os
import Tkinter as tk




def get_page():
    global driver
    driver = webdriver.Chrome()
    driver.get(url)
    last_height = driver.execute_script('return 
                      document.body.scrollHeight')
    while True:
        driver.execute_script('window.scrollTo(0, 
        document.body.scrollHeight);')
        new_height = driver.execute_script('return 
        document.body.scrollHeight')
        if new_height == last_height:
            break
        else:
            last_height = new_height


#This function uses BeautifulSoup to parse through the page source and find images.
    def get_img():

        sp = bs(driver.page_source, 'html.parser')
        for image in sp.find_all('img'):
            images.append(image)


#Create folder which will contain downloaded images.
    def make_dir():
        if not os.path.exists('Downloaded images'):
            os.mkdir('Downloaded images')
        os.chdir('Downloaded images')


#Function which saves images.
    def save_img():

        x = 0

        for image in images:
            try:
                url = image['src']
                source = requests.get(url)
                with open('img-{}.jpg'.format(x), 'wb') as f:
                    f.write(requests.get(url).content)
                    x += 1
            except:
                print 'Error while saving image.'

root = tk.Tk()
root.title('Image Scraper 1.0')
tk.Label(root, text = 'Enter URL:').grid(row=0)
e1 = tk.Entry(root)
e1.grid(row=0, column=1)
e1.insert(driver.get(url))
button1 = tk.Button(root, text = 'SCRAPE', command =scrape_site).grid(row=3, column=1, sticky=tk.W, pady=4)
button1.pack()

root.mainloop()

我試圖將整個scrape_site函數放在tkinters button command =中，這很愚蠢，我現在看到了，很顯然它不起作用。 如您所見，我將整個tkinter代碼復制到了主要的scraper文件中。 有什么想法嗎？ 我將不勝感激:)

我最近發布了有關網絡刮板的問題，該刮板下載了貓的圖像。 這次我決定，我將再向前邁出一步。 我想制作GUI Web刮板，它將從網站上下載圖像，用戶將在tkinter Entry小部件中輸入圖像。 這有可能嗎？ 我還創建了兩個.py文件：一個用於scraper腳本，另一個用於gui。 可以這種方式存儲還是將其存儲為一個文件？ 這是打開和滾動頁面（使用硒）的刮板代碼，它工作正常。 我唯一的問題是：如何將其放入tkinter？ :)

def get_page():
    global driver
    driver = webdriver.Chrome()
    driver.get(url)
    last_height = driver.execute_script('return document.body.scrollHeight')
    while True:
        driver.execute_script('window.scrollTo(0, 
               document.body.scrollHeight);')
        new_height = driver.execute_script('return 
               document.body.scrollHeight')
        if new_height == last_height:
            break
        else:
            last_height = new_height
get_page()

Answer 1

如我的評論中所述，您應該修改get_page以將url作為參數。 下面的簡單示例顯示了它如何工作，但已替換了get_page函數（我沒有硒）。

try:
    import tkinter as tk
except:
    import Tkinter as tk

def get_page(url):
    print("Getting cats from {}".format(url))

class App(tk.Frame):
    def __init__(self,master=None,**kw):
        tk.Frame.__init__(self,master=master,**kw)
        self.txtURL = tk.StringVar()
        self.entryURL = tk.Entry(self,textvariable=self.txtURL)
        self.entryURL.grid(row=0,column=0)
        self.btnGet = tk.Button(self,text="Get Some Cats!",command=self.getCats)
        self.btnGet.grid(row=0,column=1)

    def getCats(self):
        get_page(self.txtURL.get())


if __name__ == '__main__':
    root = tk.Tk()
    App(root).grid()
    root.mainloop()

您可以將URL輸入到Entry小部件中，按按鈕，然后URL被發送到函數。

如果您的get_page函數位於單獨的文件中，則只需使用from my_other_file import get_page ，其中是包含get_page函數的python文件的名稱

Python和Tkinter中的GUI Web抓取工具

問題描述

編輯

1 個解決方案

解決方案1
0 已采納 2018-03-19 11:50:43

Python和Tkinter中的GUI Web抓取工具

問題描述

編輯

1 個解決方案

解決方案1 0 已采納 2018-03-19 11:50:43

解決方案1
0 已采納 2018-03-19 11:50:43