簡體   English   中英

IOError:[Errno套接字錯誤] [Errno 8]提供的節點名或服務名,或者未知

[英]IOError: [Errno socket error] [Errno 8] nodename nor servname provided, or not known

我正在嘗試使用Yahoo Finance的多線程抓取並將庫存數據保存到SQL。 但是,出現以下錯誤:

*Exception in thread Thread-3091:
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/threading.py", line 810, in __bootstrap_inner
    self.run()
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/threading.py", line 763, in run
    self.__target(*self.__args, **self.__kwargs)
  File "todatabase.py", line 19, in th
    htmltext = urllib.urlopen(base).read()
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib.py", line 87, in urlopen
    return opener.open(url)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib.py", line 213, in open
    return getattr(self, name)(url)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib.py", line 350, in open_http
    h.endheaders(data)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 1049, in endheaders
    self._send_output(message_body)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 893, in _send_output
    self.send(msg)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 855, in send
    self.connect()
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 832, in connect
    self.timeout, self.source_address)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/socket.py", line 557, in create_connection
    for res in getaddrinfo(host, port, 0, SOCK_STREAM):
IOError: [Errno socket error] [Errno 8] nodename nor servname provided, or not known* 

這是我的代碼:

from threading import Thread
import sqlite3
import urllib
import re

conn = sqlite3.connect('stock.sqlite')
cur = conn.cursor()

cur.execute('''CREATE TABLE IF NOT EXISTS Stock
    (symbol TEXT UNIQUE PRIMARY KEY, price NUMERIC) ''')

dic = {}

def th(ur):
    base = "http://finance.yahoo.com/q?s=" + ur
    regex = '<span id="yfs_l84_[^.]*">(.+?)</span>'
    pattern = re.compile(regex)
    htmltext = urllib.urlopen(base).read()
    results = re.findall(pattern, htmltext)

    try:
        dic[ur] = results[0]
    except:
        print 'got a error!'

symbolslist = open("symbols.txt").read()
symbolslist = symbolslist.split("\n")
threadlist = []

for u in symbolslist:
    t = Thread(target = th, args = (u,))
    t.start()
    threadlist.append(t)

for b in threadlist:
    b.join()

for key, value in dic.items():
    print key, value

    cur.execute('INSERT INTO Stock(symbol,price) VALUES (?,?)',(key,value))
    conn.commit()

cur.close()

我認為錯誤可能出在多線程部分,因為我可以不使用多線程而以低速獲取數據。

有了多線程和這個錯誤,我最后只能得到200+(符號,價格),而不是3145。

我試圖更改DNS和IP,但無法解決。

我記得我曾經遇到過多線程和大規模套接字打開的問題。 另一個鎖為我解決了這個問題。 但是,我沒有嘗試發現actuel問題。 urllib文檔提及線程安全性。 您可以嘗試這樣的事情:

global_lock = threading.Lock()
...
def th(ur):
    ...
    with global_lock:
        fd = urllib.urlopen(base)
    with fd:
        htmltext = fd.read()

編輯

您也可以使用龍卷風或asyncio之類的庫使用單線程( 異步IO )代碼。

順便說一句,通過在每個線程中使用sqlite連接,您可以在將抓取的數據檢索到相應線程中后立即存儲它們。

我也收到此錯誤。 我只是為每個線程添加一些睡眠時間,問題就解決了。 我使用了time.sleep(0.1)。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM