简体   繁体   English

Python request.get持续超时约一分钟,然后继续正常工作

[英]Python requests.get keeps getting timed out for about a minute, then continues working normally

I have been trying to resolve this issue for about a week now. 我已经尝试解决此问题大约一个星期了。 Either I am missing something really obvious or the problem is on the server side of the API, or server is intentionally stalling me (I coded in python). 我可能错过了确实很明显的东西,或者问题出在API的服务器端,或者服务器故意使我停滞了(我用python编码)。

What I am trying to do: 我正在尝试做的是:

  1. I am trying to get financial data (bter market depth for all markets). 我正在尝试获取财务数据(更好地了解所有市场的市场深度)。 The problem is the exchange service's api only supports getting data for only one market (total around 75-85, variable) So I decided that I am going to start a thread for each market 问题在于交换服务的api仅支持仅针对一个市场获取数据(总计大约75-85,可变),因此我决定为每个市场启动一个线程
  2. Every thread will handle one market, try to get data for that market, if successful return, if not add the market back to queue to be handled by a new thread later 每个线程将处理一个市场,如果成功返回,则尝试获取该市场的数据,如果没有成功,则将市场重新添加到队列中,以供稍后新线程处理
  3. Do this until all markets are covered, and repeat indefinitely to keep the data up-to-date 这样做直到覆盖所有市场,然后无限重复以保持数据为最新。

I coded this in python, using requests library. 我使用请求库在python中对此进行了编码。 It works fine for a few iterations, but then server stops responding. 经过几次迭代,它仍然可以正常工作,但是服务器停止响应。 To overcome this, I added timeout to requests.get. 为了克服这个问题,我在request.get中添加了超时。 It times out, but the server does not respond to new queries either, for around 1 minute. 它超时,但是服务器在大约1分钟内也没有响应新查询。 Then everything works smoothly for a few iterations again, then things stall, and this repeats. 然后,一切又可以顺利进行几次迭代,然后停顿,然后重复一次。

Here is the python code. 这是python代码。

import requests, json
import thread, threading
from time import sleep, clock


#Get queue
conn = requests.get('http://data.bter.com/api/1/pairs/')
mainQueue = json.loads(conn.content)
conn.close()


#Variable globals
marketCount = 0
queue = mainQueue[:]


#Static globals
lock = threading.Lock()
completeSize = len(queue)


def getOrderData(marketid):
    global queue, marketCount

    try:
        data = requests.get(str('http://data.bter.com/api/1/depth/'
                                +marketid), timeout = 3)
    except:
        with lock:
            print "Timed out: %s" % marketid
            queue.append(marketid)
        return

    with lock:
        marketCount += 1
        data.close()
    return


while True:
    print "##################################"

    #Initialize data
    crT = clock()
    marketCount = 0
    queue = mainQueue[:]


    #Start retrieving all markets
    while marketCount != completeSize:
        while len(queue) == 0 and marketCount != completeSize:
            sleep(0.01)

        if marketCount != completeSize:
            marketid = queue.pop(0)
            thread.start_new_thread(getOrderData, (marketid,))

    #Print time spent
    print "Finished, total time:",clock()-crT
    sleep(1)

Here is the way program behaves during runtime. 这是程序在运行时的行为方式。 这是程序在运行时的行为方式。

Finished indicates that I got all the financial data once, and starting to update it again. “完成”表示我一次获得了所有财务数据,并开始再次对其进行更新。 As you can see, it seems that everything works fine, and then it starts stalling, timing out. 如您所见,似乎一切正常,然后开始停顿,超时。 All of a sudden, things start working normally again. 突然,事情又开始正常工作了。 I also noticed that after I close get connection with data.close(), a tcp connection with state TIME_WAIT remains in the tcp monitoring program for a long time. 我还注意到,在关闭与data.close()的连接后,状态为TIME_WAIT的tcp连接将在tcp监视程序中保留很长时间。 After a few iterations, there are TONS of them, just waiting with the TIME_WAIT state. 经过几次迭代后,其中有大量TONS,仅在TIME_WAIT状态下等待。

So, here is my question 所以,这是我的问题

  1. Is it possible that, of all the TCP connections that remain in the TIME_WAIT state, are waiting for the server to send them some sort of signal to release them? 在所有保持TIME_WAIT状态的TCP连接中,是否有可能正在等待服务器发送某种信号来释放它们? If so, is it possible that server stops responding to me because I have too many (alive? active?) connections at once? 如果是这样,服务器是否可能因为我一次有太多(活动?活动?)连接而停止响应我?
  2. If not, why does that stalling period where all my get requests keep timing out happen? 如果不是,那为什么所有我的get请求都保持超时的停滞期发生了? Is it because server might have a limit on queries per client per minute? 是因为服务器可能会对每个客户端每分钟的查询有限制吗? and when I reach that, about after a minute, it magically starts working normally again, 等到大约一分钟后,它又神奇地恢复正常工作,
  3. I have TONS and TONS of TCP connections waiting in TIME_WAIT state and they keep accumulating. 我有TONS和TONS的TCP连接处于TIME_WAIT状态,它们不断积累。 (I start around 80 connections per second. If 4 minutes is what it takes for connections to get released completely, that would be 19200 connections accumulated) How to resolve this, is it a problem at all? (我每秒大约启动80个连接。如果要完全释放连接需要4分钟,那将是累积19200个连接)如何解决这个问题,这根本不是问题吗?
  4. I start so many threads. 我开始了很多线程。 Is it a problem? 这是个问题吗?
  5. Getting all the data in a linear fashion, one market after another is not an option, too slow, data will be out of date. 以线性方式获取所有数据是不可能的,一个市场一个接一个,太慢了,数据将过时。 Any other ways I can keep the entire markets data up to date ?(max 3 sec age) 我可以通过其他任何方式使整个市场数据保持最新状态吗(最长3秒年龄)
  6. Anything else you wanna tell me? 您还有什么要告诉我的吗?

I am aware that my code does not save the data, yet. 我知道我的代码还没有保存数据。 I am just trying to get it first. 我只是想先得到它。 Code is bad, but since I am testing a short snipped I did not bother with commenting (I changed it so many times trying to find a way) 代码不好,但是因为我正在测试一个简短的片段,所以我没有理会注释(我改变了很多次,试图找到一种方法)

Thanks in advance. 提前致谢。 I really hope I can overcome this. 我真的希望我能克服这个困难。

You are executing requests very frequently, which is most likely not allowed by the server. 您正在非常频繁地执行请求,这很可能是服务器不允许的。 The technique they are probably using is called throttling , have a look at http://www.django-rest-framework.org/api-guide/throttling , they have a nice explanation about it. 他们可能使用的技术称为调节 ,请访问http://www.django-rest-framework.org/api-guide/throttling ,他们对此有很好的解释。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM