简体   繁体   English

芹菜:组中的一个子任务总是超时

[英]Celery: One subtask in group always timesout

I'm experiencing a rather annoying behavior with Celery's group feature. 我在Celery的群组功能中遇到了一个令人讨厌的行为。

I periodically need to check the IPs a bunch of hosts resolve to, just to make sure that said IPs haven't changed. 我定期需要检查一堆主机解析的IP,以确保所述IP不变。 In order to do that, I have a dictionary with the < hostname, IPs > I need to verify. 为此,我需要使用带有< hostname, IPs >的字典。 For instance: 例如:

REQUIRED_HOSTS = {
    'google.com': {'173.194.46.64', '173.194.46.70', '173.194.46.71'},
    'stackoverflow.com': {'198.252.206.16'}
}

So the only thing to do is periodically iterate the REQUIRED_HOSTS.keys() , resolve the name and see if any of the IPs it resolves to is different to what I have recorded. 因此,唯一要做的就是定期迭代REQUIRED_HOSTS.keys() ,解析名称,然后查看解析为的IP是否与我记录的IP不同。 (Not much of a brainer here) (这里不多说)

In order to improve the efficiency a bit, each name is resolved in parallel. 为了稍微提高效率,每个名称都并行解析。 I created a subtask for that (it resolves using dnspython ): 我为此创建了一个子任务(它使用dnspython解析):

@my_tasks.task
def resolve_hostname(hostname, resolver=None):
    """ This subtask resolves the 'hostname' to its IP addresses. It's
    intended to be used in the 'compare_required_ips' function to resolve
    names in parallel """
    if resolver is None:
        resolver = dns.resolver.Resolver()
        resolver.nameservers = ['8.8.8.8' + '4.2.2.2'] + resolver.nameservers

    try:
        return (hostname,
                {hst.address for hst in resolver.query(hostname)})
    except Exception, e:
        logger.exception("Got %s when trying to resolve hostname=%s"
                         % (type(e), hostname))
        raise e

Now, the method that queries all the hostnames and spawns subtasks is the following: 现在,查询所有主机名并生成子任务的方法如下:

@my_taks.task
def compare_required_ips():
    """ This method verifies that the IPs haven't changed. """
    retval = []
    resolver = dns.resolver.Resolver()
    resolver.nameservers = ['8.8.8.8' + '4.2.2.2'] + resolver.nameservers
    retrieved_hosts = dict.fromkeys(required_hosts.REQUIRED_HOSTS.keys())
    logger.info("Going to compare IPs for %s hostnames=%s"
                % (len(required_hosts.REQUIRED_HOSTS.keys()),
                   required_hosts.REQUIRED_HOSTS.keys()))
    ip_subtasks = group(
        [resolve_hostname.s(hostname, resolver=resolver)
         for hostname in required_hosts.REQUIRED_HOSTS.keys()]
    )()
    for hostname, ips in ip_subtasks.get(timeout=90):
        retrieved_hosts[hostname] = ips

    for hostname in required_hosts.REQUIRED_HOSTS:
        if (required_hosts.REQUIRED_HOSTS[hostname]
                != retrieved_hosts[hostname]):
            retval.append(hostname)
            logger.error(
                "IP resolution mismatch. hostname=%s resolve_target=%s"
                ", resolve_actual=%s (mismatch=%s)"
                % (hostname,
                   required_hosts.REQUIRED_HOSTS[hostname],
                   retrieved_hosts[hostname],
                   (required_hosts.REQUIRED_HOSTS[hostname]
                    ^ retrieved_hosts[hostname]))
            )
    return retval

Again, fairly easy... Just walk the REQUIRED_HOSTS keys (aka hostnames ), spawn a subtask to resolve each of them and then collect the results with a 90 seconds timeout (which occurs in the line for hostname, ips in ip_subtasks.get(timeout=90) ) 再次,相当容易...只要走动REQUIRED_HOSTS键(又名主机名 ),生成一个子任务来解析每个键,然后以90秒的超时时间收集结果(在for hostname, ips in ip_subtasks.get(timeout=90)for hostname, ips in ip_subtasks.get(timeout=90)

Now, the nuisance is that all the subtasks except one are successfully finished within that 90 seconds window. 现在,令人讨厌的是,除一个子任务外,所有子任务均在该90秒窗口内成功完成。 Then the parent task ( compare_required_ips ) fails because of the timeout=90 and when this happens the subtask is successfully finished (immediately after the parent has failed). 然后,由于timeout=90 ,父任务( compare_required_ips )失败,并且当这种情况发生时,子任务成功完成(在父失败之后立即完成)。 I have tried increasing and decreasing the timeout, and the subtask always takes whatever timeout I have specified in the group creation, making the main task report a failure. 我尝试增加和减少超时,并且子任务始终采用我在group创建中指定的任何超时,从而使主任务报告失败。

I have also run the name resolution manually (without making it celery tasks, but using regular threading) and it resolves in milliseconds. 我还手动运行了名称解析(无需执行celery任务,而是使用常规线程),并且可以在毫秒内解析。 Every time, with every test I try to make. 每次我尝试进行的每一次测试。 I don't think it's an issue with the dns.resolver.Resolver() class . 我认为dns.resolver.Resolver() 不是问题。 Everything seems to point that this class resolves blazingly fast, but the subtask, or the group, or... someone in Celery doesn't know about it (one of the subtasks only, though) 似乎所有内容都表明该类的解析速度非常快,但子任务或组或...... Celery中的某个人对此一无所知(不过,仅一个子任务)

I am using celery==3.1.9 , celery-with-redis==3.0 and flower==0.6.0 to monitor. 我正在使用celery==3.1.9celery-with-redis==3.0flower==0.6.0进行监视。

Any help, hint or thing to test will be very appreciated. 任何帮助,提示或要测试的东西将不胜感激。

One problem might be a deadlock due to launching of synchronous sub tasks. 一个问题可能是由于启动同步子任务而导致的死锁。 compare_required_ips is a celery task. compare_required_ips是一项芹菜任务。 Inside this task you are waiting for a group of resolve_hostname tasks to complete which is really inefficient. 在此任务中,您正在等待一group resolve_hostname任务完成,这确实效率很低。

So you have to change this 所以你必须改变这个

ip_subtasks = group(
        [resolve_hostname.s(hostname, resolver=resolver)
         for hostname in required_hosts.REQUIRED_HOSTS.keys()]
    )()

to

ip_subtasks = group(
        [resolve_hostname.s(hostname, resolver=resolver)
         for hostname in required_hosts.REQUIRED_HOSTS.keys()]
    ).delay()

which launches all your tasks asynchronously and there by avoiding deadlock. 通过避免死锁,异步地启动所有任务。

and

you shouldn't do a ip_subtasks.get() inside compate_required_ips task(even if ip_subtask takes only a nano second). 您不应该在compate_required_ips任务中执行ip_subtasks.get() (即使ip_subtask只需要一纳秒的时间)。 You have to write a new function for that or use celery task_success signal . 您必须为此编写一个新函数或使用celery task_success signal

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM