![](/img/trans.png)
[英]NameError in tkinter GUI based Reddit Scraper Application --Python
[英]GUI web scraper in Python and Tkinter
好的,到目前為止,這是我的全部代碼:
from selenium import webdriver
from bs4 import BeautifulSoup as bs
import requests
import time
import os
import Tkinter as tk
def get_page():
global driver
driver = webdriver.Chrome()
driver.get(url)
last_height = driver.execute_script('return
document.body.scrollHeight')
while True:
driver.execute_script('window.scrollTo(0,
document.body.scrollHeight);')
new_height = driver.execute_script('return
document.body.scrollHeight')
if new_height == last_height:
break
else:
last_height = new_height
#This function uses BeautifulSoup to parse through the page source and find images.
def get_img():
sp = bs(driver.page_source, 'html.parser')
for image in sp.find_all('img'):
images.append(image)
#Create folder which will contain downloaded images.
def make_dir():
if not os.path.exists('Downloaded images'):
os.mkdir('Downloaded images')
os.chdir('Downloaded images')
#Function which saves images.
def save_img():
x = 0
for image in images:
try:
url = image['src']
source = requests.get(url)
with open('img-{}.jpg'.format(x), 'wb') as f:
f.write(requests.get(url).content)
x += 1
except:
print 'Error while saving image.'
root = tk.Tk()
root.title('Image Scraper 1.0')
tk.Label(root, text = 'Enter URL:').grid(row=0)
e1 = tk.Entry(root)
e1.grid(row=0, column=1)
e1.insert(driver.get(url))
button1 = tk.Button(root, text = 'SCRAPE', command =scrape_site).grid(row=3, column=1, sticky=tk.W, pady=4)
button1.pack()
root.mainloop()
我試圖將整個scrape_site函數放在tkinters button command =中,這很愚蠢,我現在看到了,很顯然它不起作用。 如您所見,我將整個tkinter代碼復制到了主要的scraper文件中。 有什么想法嗎? 我將不勝感激:)
我最近發布了有關網絡刮板的問題,該刮板下載了貓的圖像。 這次我決定,我將再向前邁出一步。 我想制作GUI Web刮板,它將從網站上下載圖像,用戶將在tkinter Entry小部件中輸入圖像。 這有可能嗎? 我還創建了兩個.py文件:一個用於scraper腳本,另一個用於gui。 可以這種方式存儲還是將其存儲為一個文件? 這是打開和滾動頁面(使用硒)的刮板代碼,它工作正常。 我唯一的問題是:如何將其放入tkinter? :)
def get_page():
global driver
driver = webdriver.Chrome()
driver.get(url)
last_height = driver.execute_script('return document.body.scrollHeight')
while True:
driver.execute_script('window.scrollTo(0,
document.body.scrollHeight);')
new_height = driver.execute_script('return
document.body.scrollHeight')
if new_height == last_height:
break
else:
last_height = new_height
get_page()
如我的評論中所述,您應該修改get_page
以將url作為參數。 下面的簡單示例顯示了它如何工作,但已替換了get_page函數(我沒有硒)。
try:
import tkinter as tk
except:
import Tkinter as tk
def get_page(url):
print("Getting cats from {}".format(url))
class App(tk.Frame):
def __init__(self,master=None,**kw):
tk.Frame.__init__(self,master=master,**kw)
self.txtURL = tk.StringVar()
self.entryURL = tk.Entry(self,textvariable=self.txtURL)
self.entryURL.grid(row=0,column=0)
self.btnGet = tk.Button(self,text="Get Some Cats!",command=self.getCats)
self.btnGet.grid(row=0,column=1)
def getCats(self):
get_page(self.txtURL.get())
if __name__ == '__main__':
root = tk.Tk()
App(root).grid()
root.mainloop()
您可以將URL輸入到Entry小部件中,按按鈕,然后URL被發送到函數。
如果您的get_page
函數位於單獨的文件中,則只需使用from my_other_file import get_page
,其中是包含get_page
函數的python文件的名稱
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.