從網站獲取實時數據並不斷更新數據

Question

將html = urllib.request.urlopen（req）放入while循環中時，可以輕松獲取數據，但是大約需要3秒鍾才能獲取數據。 所以我想，也許如果我把它放在外面，我可以更快地獲取它，因為它不必每次都打開URL，但這會引發AttributeError：'str'對象沒有屬性'read' 。 也許它無法識別HTML變量名稱。 如何加快處理速度？

def soup():
url = "http://www.investing.com/indices/major-indices"
req = urllib.request.Request(
url, 
data=None, 
headers={
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36',
    'Connection': 'keep-alive'    }
       )
global Ltp
global html
html = urllib.request.urlopen(req)
while True:
    html = html.read().decode('utf-8')
    bsobj = BeautifulSoup(html, "lxml")   

    Ltp = bsobj.find("td", {"class":"pid-169-last"} )
    Ltp = (Ltp.text)
    Ltp = Ltp.replace(',' , '');
    os.system('cls')     
    Ltp = float(Ltp)
    print (Ltp, datetime.datetime.now())    

soup()

Answer 1

如果您想實時獲取，則需要定期調用url

html = urllib.request.urlopen(req)

這應該是一個循環。

import os
import urllib
import datetime
from bs4 import BeautifulSoup
import time


def soup():
    url = "http://www.investing.com/indices/major-indices"
    req = urllib.request.Request(
    url,
    data=None,
    headers={
        'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36',
        'Connection': 'keep-alive'    }
           )
    global Ltp
    global html
    while True:
        html = urllib.request.urlopen(req)
        ok = html.read().decode('utf-8')
        bsobj = BeautifulSoup(ok, "lxml")

        Ltp = bsobj.find("td", {"class":"pid-169-last"} )
        Ltp = (Ltp.text)
        Ltp = Ltp.replace(',' , '');
        os.system('cls')
        Ltp = float(Ltp)
        print (Ltp, datetime.datetime.now())
        time.sleep(3)

soup()

結果：

sh: cls: command not found
18351.61 2016-08-31 23:44:28.103531
sh: cls: command not found
18351.54 2016-08-31 23:44:36.257327
sh: cls: command not found
18351.61 2016-08-31 23:44:47.645328
sh: cls: command not found
18351.91 2016-08-31 23:44:55.618970
sh: cls: command not found
18352.67 2016-08-31 23:45:03.842745

Answer 2

重新分配html等於那么UTF-8字符串響應不停的叫喚它像它的一個IO ...此代碼不會在每個循環服務器獲取新的數據， read簡單地讀取來自字節IO對象，它不使新的要求。

您可以使用Requests庫加快處理速度，並利用持久連接（或直接使用urllib3）

試試這個（您將需要pip install requests ）

import os
import datetime

from requests import Request, Session
from bs4 import BeautifulSoup

s = Session()

while True:
  resp = s.get("http://www.investing.com/indices/major-indices")
  bsobj = BeautifulSoup(resp.text, "html.parser")   
  Ltp = bsobj.find("td", {"class":"pid-169-last"} )
  Ltp = (Ltp.text)
  Ltp = Ltp.replace(',' , '');
  os.system('cls')     
  Ltp = float(Ltp)
  print (Ltp, datetime.datetime.now())

從網站獲取實時數據並不斷更新數據

問題描述

2 個解決方案

解決方案1
0 2016-08-31 16:45:49

解決方案2
0 2016-08-31 17:11:58

從網站獲取實時數據並不斷更新數據

問題描述

2 個解決方案

解決方案1 0 2016-08-31 16:45:49

解決方案2 0 2016-08-31 17:11:58

解決方案1
0 2016-08-31 16:45:49

解決方案2
0 2016-08-31 17:11:58