简体   繁体   中英

Python Multiprocessing.Pool workers hang when using pool.map

So I have a script that connects to approximately 700 devices and executes a series of commands, then exits. I started using Multiprocessing.Pool and Pool.map to reduce the run-time of the script and allow me to concurrently log into multiple devices at the same time.

Now, I've been running into strange issues where the workers in the pool will hang indefinitely and I cannot figure out why they are doing so. Also I am still somewhat new to Python so any feedback on my approach is appreciated. Below is my code:

def expect(ssh_channel_reference, pattern, command, wait, debug):
    timeout_counter = 0
    full_buffer = ''
    temp_buffer = ''
    while timeout_counter < wait:
        temp_buffer = ssh_channel_reference.recv(8192)
        # print temp_buffer
        if pattern in temp_buffer:
            ssh_channel_reference.send('%s\r' % command)
            timeout_counter = wait + 10
            full_buffer += temp_buffer
        else:
            time.sleep(0.01)
            timeout_counter += 0.01
            full_buffer += temp_buffer
        if debug:
            return full_buffer
    if timeout_counter == wait:
        raise Exception()


def backup_device(device, user, user_pass, failures):
    attempt_counter = 0
    if re.search('(^\d+\.\d+\.\d+\.\d+$)', device):
        host_name = socket.gethostbyaddr(device)
    else:
        host_name = device
    logging.info("Connecting to: %s" % host_name)
    command = 'copy config to backup'
    while attempt_counter <= 5:
        try:
            output_buffer = ''
            ssh_handle = paramiko.SSHClient()
            ssh_handle.set_missing_host_key_policy(paramiko.AutoAddPolicy())
            ssh_handle.connect(hostname=host_name, username=user, password=user_pass, timeout=20, look_for_keys=False)
            ssh_channel = ssh_handle.invoke_shell()
            output_buffer += expect(channel=ssh_channel, pattern='#', command='%s\r' % command, wait=10, debug=True)
            output_buffer += expect(channel=ssh_channel, pattern=']?', command='\n', wait=5, debug=True)
            output_buffer += expect(channel=ssh_channel, pattern=']?', command='\n', wait=5, debug=True)
            output_buffer += expect(channel=ssh_channel, pattern='#', command='exit', wait=5, debug=True)
            print output_buffer
            ssh_handle.close()
        except Exception as inst:
            logging.debug(inst.message)
            attempt_counter += 1
        else:
            logging.info('%s - backed up successfully.' % host_name)
    if attempt_counter == 5:
        logging.critical("Unable to log into device - %s" % host_name)
        failures.append(host_name)
    return

if __name__ == "__main__":
    logging.basicConfig(filename='herpderp.log', filemode='w', level=logging.DEBUG,
                        format='%(asctime)s - %(levelname)s:%(message)s')
    partial_method = partial(backup_device, user=username, user_pass=password, failures=failed_devices)
    pool = multiprocessing.Pool(processes=4, maxtasksperchild=1)
    pool_map = pool.map(func=partial_method, iterable=devices, chunksize=4)
    pool.close()
    pool.join()

Output from Multiprocessing Log:

Starting MainProcess
[DEBUG/MainProcess] created semlock with handle 47379818131456
[DEBUG/MainProcess] created semlock with handle 47379818135552
[DEBUG/MainProcess] created semlock with handle 47379818139648
[DEBUG/MainProcess] created semlock with handle 47379818143744
[DEBUG/MainProcess] added worker
[INFO/PoolWorker-1] child process calling self.run()
[DEBUG/MainProcess] added worker
[INFO/PoolWorker-2] child process calling self.run()
[DEBUG/MainProcess] added worker
[INFO/PoolWorker-3] child process calling self.run()
[DEBUG/MainProcess] added worker
[INFO/PoolWorker-4] child process calling self.run()

Truncated logging output:

2015-05-13 16:04:11,748 - INFO:Connecting to: HOSTNAME
2015-05-13 16:04:11,760 - DEBUG:starting thread (client mode): 0x1ebaf350L
2015-05-13 16:04:11,769 - INFO:Connected (version 1.99, client Cisco-1.25)
2015-05-13 16:04:11,770 - DEBUG:kex algos:[u'diffie-hellman-group1-sha1'] server key:[u'ssh-rsa'] client encrypt:[u'aes128-cbc', u'3des-cbc', u'aes192-cbc', u'aes256-cbc'] server encrypt:[u'aes128-cbc', u'3des-cbc', u'aes192-cbc', u'aes256-cbc'] client mac:[u'hmac-sha1', u'hmac-sha1-96', u'hmac-md5', u'hmac-md5-96'] server mac:[u'hmac-sha1', u'hmac-sha1-96', u'hmac-md5', u'hmac-md5-96'] client compress:[u'none'] server compress:[u'none'] client lang:[u''] server lang:[u''] kex follows?False
2015-05-13 16:04:11,770 - DEBUG:Ciphers agreed: local=aes128-cbc, remote=aes128-cbc
2015-05-13 16:04:11,770 - DEBUG:using kex diffie-hellman-group1-sha1; server key type ssh-rsa; cipher: local aes128-cbc, remote aes128-cbc; mac: local hmac-sha1, remote hmac-sha1; compression: local none, remote none
2015-05-13 16:04:12,038 - DEBUG:Switch to new keys ...
2015-05-13 16:04:12,064 - DEBUG:Adding ssh-rsa host key for HOSTNAME: KEY
2015-05-13 16:04:12,257 - DEBUG:userauth is OK
2015-05-13 16:04:12,258 - INFO:Auth banner: <INSERT BANNER HERE>
2015-05-13 16:04:12,314 - INFO:Authentication (password) successful!
2015-05-13 16:04:12,330 - DEBUG:[chan 0] Max packet in: 32768 bytes
2015-05-13 16:04:12,341 - DEBUG:[chan 0] Max packet out: 4096 bytes
2015-05-13 16:04:12,341 - DEBUG:Secsh channel 0 opened.
2015-05-13 16:04:12,356 - DEBUG:[chan 0] Sesch channel 0 request ok
2015-05-13 16:04:12,365 - DEBUG:[chan 0] Sesch channel 0 request ok

The above logging is what I see before the pool hangs.

So the issue was Channel.recv(Nbytes) is a blocking call and was causing the threads to hang. I added channel.settimeout(float) and this corrected the issue, the only other thing to note was that I needed to catch the socket.timeout exception caused by the timeouts.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM