简体   繁体   English

IOError:[Errno套接字错误] [Errno 8]提供的节点名或服务名,或者未知

[英]IOError: [Errno socket error] [Errno 8] nodename nor servname provided, or not known

I was trying to scrape and save the stock data to SQL using multithreading from Yahoo finance. 我正在尝试使用Yahoo Finance的多线程抓取并将库存数据保存到SQL。 However, I got the following error: 但是,出现以下错误:

*Exception in thread Thread-3091:
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/threading.py", line 810, in __bootstrap_inner
    self.run()
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/threading.py", line 763, in run
    self.__target(*self.__args, **self.__kwargs)
  File "todatabase.py", line 19, in th
    htmltext = urllib.urlopen(base).read()
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib.py", line 87, in urlopen
    return opener.open(url)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib.py", line 213, in open
    return getattr(self, name)(url)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib.py", line 350, in open_http
    h.endheaders(data)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 1049, in endheaders
    self._send_output(message_body)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 893, in _send_output
    self.send(msg)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 855, in send
    self.connect()
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 832, in connect
    self.timeout, self.source_address)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/socket.py", line 557, in create_connection
    for res in getaddrinfo(host, port, 0, SOCK_STREAM):
IOError: [Errno socket error] [Errno 8] nodename nor servname provided, or not known* 

Here is my code: 这是我的代码:

from threading import Thread
import sqlite3
import urllib
import re

conn = sqlite3.connect('stock.sqlite')
cur = conn.cursor()

cur.execute('''CREATE TABLE IF NOT EXISTS Stock
    (symbol TEXT UNIQUE PRIMARY KEY, price NUMERIC) ''')

dic = {}

def th(ur):
    base = "http://finance.yahoo.com/q?s=" + ur
    regex = '<span id="yfs_l84_[^.]*">(.+?)</span>'
    pattern = re.compile(regex)
    htmltext = urllib.urlopen(base).read()
    results = re.findall(pattern, htmltext)

    try:
        dic[ur] = results[0]
    except:
        print 'got a error!'

symbolslist = open("symbols.txt").read()
symbolslist = symbolslist.split("\n")
threadlist = []

for u in symbolslist:
    t = Thread(target = th, args = (u,))
    t.start()
    threadlist.append(t)

for b in threadlist:
    b.join()

for key, value in dic.items():
    print key, value

    cur.execute('INSERT INTO Stock(symbol,price) VALUES (?,?)',(key,value))
    conn.commit()

cur.close()

I think the mistake maybe in the multithreading part since I can get the data without using multithreading but at low speed. 我认为错误可能出在多线程部分,因为我可以不使用多线程而以低速获取数据。

With multithreading and this error, I only get 200+ (symbol,price) at the end rather than 3145. 有了多线程和这个错误,我最后只能得到200+(符号,价格),而不是3145。

I tried to change DNS and IP, doesn't solve it. 我试图更改DNS和IP,但无法解决。

I remember that I had problems with multithreading and massive socket opening once. 我记得我曾经遇到过多线程和大规模套接字打开的问题。 An additional lock solved the problem for me. 另一个锁为我解决了这个问题。 I did not try to find the actuel problem, however. 但是,我没有尝试发现actuel问题。 The urllib doc does not say anything about thread safety. urllib文档提及线程安全性。 You could try something like this: 您可以尝试这样的事情:

global_lock = threading.Lock()
...
def th(ur):
    ...
    with global_lock:
        fd = urllib.urlopen(base)
    with fd:
        htmltext = fd.read()

EDIT 编辑

You could alternatively use single-threaded ( async IO ) code using a library like (eg) tornado or asyncio. 您也可以使用龙卷风或asyncio之类的库使用单线程( 异步IO )代码。

BTW, by using a sqlite connection per thread, you could store the scraped data right after retrieving it in the respective thread. 顺便说一句,通过在每个线程中使用sqlite连接,您可以在将抓取的数据检索到相应线程中后立即存储它们。

I got this error as well. 我也收到此错误。 I just add some sleep time for each thread and the problem is resolved. 我只是为每个线程添加一些睡眠时间,问题就解决了。 I used time.sleep(0.1). 我使用了time.sleep(0.1)。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 正在获取socket.gaierror:[Errno 8]节点名或服务名已提供,或者未知 - Getting socket.gaierror: [Errno 8] nodename nor servname provided,or not known Tweepy:[Errno 8]提供nodename或servname,或者不知道 - Tweepy: [Errno 8] nodename nor servname provided, or not known PySpark:[Errno 8] nodename 或 servname 提供,或未知 - PySpark: [Errno 8] nodename nor servname provided, or not known gaierror:[Errno 8]提供的节点名或服务名,或者未知 - gaierror: [Errno 8] nodename nor servname provided, or not known WSGIServerException:[Errno 8]提供了nodename或servname,或者未知 - WSGIServerException: [Errno 8] nodename nor servname provided, or not known gaierror: [Errno 8] 节点名或服务名已提供,或未知 - gaierror: [Errno 8] nodename nor servname provided, or not known 网址错误: <urlopen error [errno 8] nodename nor servname provided, or not known></urlopen> - URLError: <urlopen error [Errno 8] nodename nor servname provided, or not known> Paramiko SSH 连接错误:socket.gaierror: [Errno 8] nodename 或 servname 提供,或未知 - Paramiko SSH connection error: socket.gaierror: [Errno 8] nodename nor servname provided, or not known easy_install pip == [Errno 8]提供nodename或servname,或者不知道 - easy_install pip == [Errno 8] nodename nor servname provided, or not known gaierror: [Errno 8] nodename 或 servname 提供,或未知(使用 macOS Sierra) - gaierror: [Errno 8] nodename nor servname provided, or not known (with macOS Sierra)
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM