[英]Fetching live data from website's with continiously updating data
將html = urllib.request.urlopen(req)放入while循環中時,可以輕松獲取數據,但是大約需要3秒鍾才能獲取數據。 所以我想,也許如果我把它放在外面,我可以更快地獲取它,因為它不必每次都打開URL,但這會引發AttributeError:'str'對象沒有屬性'read' 。 也許它無法識別HTML變量名稱。 如何加快處理速度?
def soup():
url = "http://www.investing.com/indices/major-indices"
req = urllib.request.Request(
url,
data=None,
headers={
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36',
'Connection': 'keep-alive' }
)
global Ltp
global html
html = urllib.request.urlopen(req)
while True:
html = html.read().decode('utf-8')
bsobj = BeautifulSoup(html, "lxml")
Ltp = bsobj.find("td", {"class":"pid-169-last"} )
Ltp = (Ltp.text)
Ltp = Ltp.replace(',' , '');
os.system('cls')
Ltp = float(Ltp)
print (Ltp, datetime.datetime.now())
soup()
如果您想實時獲取,則需要定期調用url
html = urllib.request.urlopen(req)
這應該是一個循環。
import os
import urllib
import datetime
from bs4 import BeautifulSoup
import time
def soup():
url = "http://www.investing.com/indices/major-indices"
req = urllib.request.Request(
url,
data=None,
headers={
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36',
'Connection': 'keep-alive' }
)
global Ltp
global html
while True:
html = urllib.request.urlopen(req)
ok = html.read().decode('utf-8')
bsobj = BeautifulSoup(ok, "lxml")
Ltp = bsobj.find("td", {"class":"pid-169-last"} )
Ltp = (Ltp.text)
Ltp = Ltp.replace(',' , '');
os.system('cls')
Ltp = float(Ltp)
print (Ltp, datetime.datetime.now())
time.sleep(3)
soup()
結果:
sh: cls: command not found
18351.61 2016-08-31 23:44:28.103531
sh: cls: command not found
18351.54 2016-08-31 23:44:36.257327
sh: cls: command not found
18351.61 2016-08-31 23:44:47.645328
sh: cls: command not found
18351.91 2016-08-31 23:44:55.618970
sh: cls: command not found
18352.67 2016-08-31 23:45:03.842745
重新分配html
等於那么UTF-8字符串響應不停的叫喚它像它的一個IO
...此代碼不會在每個循環服務器獲取新的數據, read
簡單地讀取來自字節IO
對象,它不使新的要求。
您可以使用Requests庫加快處理速度,並利用持久連接(或直接使用urllib3)
試試這個(您將需要pip install requests
)
import os
import datetime
from requests import Request, Session
from bs4 import BeautifulSoup
s = Session()
while True:
resp = s.get("http://www.investing.com/indices/major-indices")
bsobj = BeautifulSoup(resp.text, "html.parser")
Ltp = bsobj.find("td", {"class":"pid-169-last"} )
Ltp = (Ltp.text)
Ltp = Ltp.replace(',' , '');
os.system('cls')
Ltp = float(Ltp)
print (Ltp, datetime.datetime.now())
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.