简体   繁体   English

Python脚本,仅使用主要实体收集IP地址的所有主机名

[英]Python script to collect all hostnames of ip addresses with only prime entities

I have a Python script to collect hostnames of ip address with primes as byte entities. 我有一个Python脚本来收集以质数作为字节实体的ip地址的主机名。 Eg, 211.13.17.2 is a valid ip according to my problem set where every byte entity(decimal representation) is a prime. 例如,根据我的问题集,211.13.17.2是有效的ip,其中每个字节实体(十进制表示形式)都是质数。

Code: 码:

from itertools import product
import socket


# prime or not
def prime(n):
    if n > 1:
        p = 0
        for i in range(2, n-1):
            if divmod(n, i)[1] == 0:
                p = 1
                break
        if p == 0:
            return True


def get_host_name(b1, b2, b3, b4):
    addr = str(b1) + '.' + str(b2) + '.' + str(b3) + '.' + str(b4)
    try:
        return socket.gethostbyaddr(addr)
    except socket.herror:
        pass


# find host names whose ip addresses are all primes
byte = [b for b in range(0, 256) if prime(b)]
ips = list(product(byte, byte, byte, byte))
print 'Total ips = ', len(ips)

for ip in ips:
    if get_host_name(*ip):
        print get_host_name(*ip)

The problem is my script is too slow. 问题是我的脚本太慢了。 I need expert help to optimize this code. 我需要专家帮助来优化此代码。 Please pinpoint all mistakes and ways to make it behave faster. 请查明所有错误和使其更快运行的方法。

for the prime numbers, you can use something like this, 对于质数,您可以使用类似这样的方法,

import numpy as np
isprime = lambda x: np.all(np.mod(x, range(2, 1 + int(np.sqrt(x)))))
primes = np.array([ x for x in range(2, 255) if isprime(x) ])

and you can have a generator for ip addresses by 您可以通过以下方式为IP地址生成一个

('{}.{}.{}.{}'.format(*x) for x in itertools.product(primes, repeat=4))

but most likely the code is slow in the socket part, and because of the number of combinations that it needs to check; 但是最有可能代码在socket部分运行缓慢,并且由于需要检查的组合数量过多; for that you may try parallelism, by using a pool of worker processes; 为此,您可以使用工作进程池来尝试并行处理; something like this: 像这样的东西:

from multiprocessing import Pool
from socket import gethostbyaddr

def gethost(addr):
    try:
        return gethostbyaddr(addr)
    except:
        pass

if __name__ == '__main__':

    p = Pool(3)
    print (p.map(gethost,['74.125.228.137',
                          '11.222.333.444',
                          '17.149.160.49',
                          '98.139.183.24']))

edit : for only prime numbers less than 50, (50K+ combinations) and 20 worker processes it takes almost 6 minutes on my machines and it finds 16K+ results. 编辑 :对于仅小于50的质数((50K +个组合)和20个工作进程),在我的机器上花费了将近6分钟,并且发现了16K +个结果。 so, with this huge number of combinations parallelism cannot help much. 因此,使用如此众多的组合,并行性无济于事。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM