简体   繁体   English

有没有更好的方法来迭代两个列表来查找python中的项之间的关系?

[英]Is there any better way to iterate two list to find relation between items in python?

I mock a ip list and a subnet dict as input: 我模拟了一个ip列表和一个子网dict作为输入:

# ip address list
ip_list = [
'192.168.1.151', '192.168.10.191', '192.168.6.127', 
'192.168.2.227', '192.168.2.5', '192.168.3.237', 
'192.168.6.188', '192.168.7.209', '192.168.9.10',
# Edited: add some /28, /16 case
'192.168.12.39', '192.168.12.58', '10.63.11.1', '10.63.102.69',
]

# subnet dict
netsets = {
'192.168.1.0/24': 'subnet-A',     # {subnet: subnet's name} 
'192.168.10.0/24': 'subnet-B', 
'192.168.2.0/24': 'subnet-C', 
'192.168.3.0/24': 'subnet-C',
'192.168.6.0/24': 'subnet-D', 
'192.168.7.0/24': 'subnet-D', 
'192.168.9.0/24': 'subnet-E',
# Edited: add some /28, /16 case
'192.168.12.32/28': 'subnet-F',
'192.168.12.48/28': 'subnet-G',
'10.63.0.0/16': 'subnet-I',
}

and then each ip address in ip_list need to find the name of subnet. 然后ip_list每个ip地址ip_list需要找到子网的名称。

We assume that each ip address can find the corresponding subnet in netsets . 我们假设每个IP地址都可以在netsets找到相应的子网。

Ouput like this: 像这样的输出:

192.168.1.151   subnet-A
192.168.10.191  subnet-B
192.168.6.127   subnet-D
192.168.2.227   subnet-C
192.168.2.5     subnet-C
192.168.3.237   subnet-C
192.168.6.188   subnet-D
192.168.7.209   subnet-D
192.168.9.10    subnet-E
# add some /28, /16 case
192.168.12.39   subnet-F
192.168.12.58   subnet-G
10.63.11.1      subnet-I
10.63.102.69    subnet-I

I use netaddr to calculate CIDR, here is my code: 我使用netaddr来计算CIDR,这是我的代码:

from netaddr import IPAddress, IPNetwork

def netaddr_test(ips, netsets):
    for ip in ips:
        for subnet, name in netsets.iteritems():
            if IPAddress(ip) in IPNetwork(subnet):
                print ip, '\t',  name
                break

netaddr_test(ip_list, netsets)

But this code is too too too slow, it iterate too much. 但是这段代码太慢了,迭代太多了。 the complexity of time is O(n**2). 时间的复杂性是O(n ** 2)。

Once we have tens of thousands of ip to iterate, this code cost too much time. 一旦我们有成千上万的ip迭代,这段代码花费了太多时间。

Is there any better way to solve this problem? 有没有更好的方法来解决这个问题?

I can recommend use specially optimized intervaltree module for making search fast. 我建议使用特别优化的intervaltree模块进行快速搜索。 Thus the task could be solved for O(m*log n) time. 因此,任务可以在O(m * log n)时间内求解。 For example: 例如:

   from intervaltree import Interval, IntervalTree
   from ipaddress import ip_network, ip_address

   # build nets tree
   netstree = IntervalTree(
                           Interval(
                                    ip_network(net).network_address, 
                                    ip_network(net).broadcast_address, 
                                    name
                                   ) 
                          for 
                          net, name 
                          in 
                          netsets.items()
                         )

   # Now you may check ip intervals     
   for i in ip_list:
       ip = ip_address(i)
       nets = netstree[ip]
       if nets:   # set is not empty
            netdata = list(nets)[0]
            print(netdata.data)
            # prints 'subnet-E'
# ip address list
ip_list = [
'192.168.1.151', '192.168.10.191', '192.168.6.127',
'192.168.2.227', '192.168.2.5', '192.168.3.237',
'192.168.6.188', '192.168.7.209', '192.168.9.10'
]

# subnet dict
netsets = {
'192.168.1.0/24': 'subnet-A',     # {subnet: subnet's name} 
'192.168.10.0/24': 'subnet-B',
'192.168.2.0/24': 'subnet-C',
'192.168.3.0/24': 'subnet-C',
'192.168.6.0/24': 'subnet-D',
'192.168.7.0/24': 'subnet-D',
'192.168.9.0/24': 'subnet-E',
}
new_netsets = {}
for k,v in netsets.items():
   new_netsets['.'.join(k.split('.')[:3])] = v

for IP in ip_list:
   newIP = '.'.join(IP.split('.')[:3])
   print IP, new_netsets[newIP]

Hope this helps. 希望这可以帮助。

I would suggest to avoid creating new instances in the for loop. 我建议避免在for循环中创建新实例。 This will not decrease complexity (it will increase it) but it will speed up the netaddr_test , especially if it is called more than one times. 这不会降低复杂性(它会增加它),但它会加速netaddr_test ,特别是如果它被调用超过一次。 Example: 例:

def _init(ips, netsets):
    """Initialize all objects"""
    new_ips = []
    new_subs = {}
    for ip in ips:
         new_ips.append(IPAddress(ip))

    for subnet, info in netsets.iteritems():

        new_subs[subnet] = {'name': info, 'subnet': IPNetwork(subnet)}

    return new_ips, new_subs

def netaddr_test(ips, netsets):
    for ip in ips:
        for stringnet, info in netsets.iteritems():
            if ip in info['subnet']:
                print ip, '\t',  info['name']
                break

ni, ns = _init(ip_list, netsets)
netaddr_test(ni, ns)

UPDATE: Tested the code above with 更新:用上面的代码测试

ip_list = [
    '192.168.1.151', '192.168.10.191', '192.168.6.127', 
    '192.168.2.227', '192.168.2.5', '192.168.3.237', 
    '192.168.6.188', '192.168.7.209', '192.168.9.10'
] * 1000

Results: 结果:

# Original
$ time python /tmp/test.py > /dev/null

real    0m0.357s
user    0m0.345s
sys     0m0.012s

# Modified
$ time python /tmp/test2.py > /dev/null

real    0m0.126s
user    0m0.122s
sys     0m0.005s

Now, I have never used netaddr so I am not sure about how it handles subnets internally. 现在,我从未使用netaddr因此我不确定它如何在内部处理子网。 In your case you can see the subnet as a range of IPs and each IP is a uint_32 so you can convert everything to integers: 在您的情况下,您可以将子网视为一系列IP,每个IP都是uint_32因此您可以将所有内容转换为整数:

 # IPs now are 
 ip_list_int = [3232235927, 3232238271, ...]

 netsets_expanded = {
     '192.168.1.0/24': {'name': 'subnet-A', 'start': 3232235776, 'end': 3232236031}

netaddr can be used to convert you data in the above format. netaddr可用于以上述格式转换数据。 Once there, your netaddr_test becomes (and works only with integer comparisons): 在那里,你的netaddr_test变为(并且仅适用于整数比较):

def netaddr_test(ips, netsets):
    for ip in ips:
        for subnet, subinfo in netsets.iteritems():
            if ip >= subinfo['start'] and ip < subinfo['end']:
                print ip, '\t',  subinfo.name
                break

In general case where you have N templates and M values to test for match you can do nothing better than O(N*M). 一般情况下,你有N个模板和M值来测试匹配,你可以做任何比O(N * M)更好的事情。 But if you can reformulate the task than you can speed it up. 但是,如果你可以重新制定任务,那么你可以加快速度。

My suggestion is to group templates so that you have a few uplevel templates and if an IP matches it than you go down to final templates. 我的建议是对模板进行分组,以便您拥有一些高级模板,如果IP匹配它,那么您将转到最终模板。 In your examples this would be 在你的例子中,这将是

grouped_netsets = {
    "192.168.0.0/16":  {
        '192.168.1.0/24': 'subnet-A',     # {subnet: subnet's name} 
        '192.168.10.0/24': 'subnet-B', 
        '192.168.2.0/24': 'subnet-C', 
        '192.168.3.0/24': 'subnet-C',
        '192.168.6.0/24': 'subnet-D', 
        '192.168.7.0/24': 'subnet-D', 
        '192.168.9.0/24': 'subnet-E',
        }
    }   

def netaddr_test(ips, grouped_netsets):
    for ip in ips:
        for group, netsets in grouped_netsets.iteritems():
            if IPAddress(ip) in IPNetwork(group):
                for subnet, name in netsets.iteritems():
                    if IPAddress(ip) in IPNetwork(subnet):
                        print(ip, '\t',  name)
                        break

So if ip_list would contain anything not starting with 192.168 you'll drop it with one check. 因此,如果ip_list包含任何不以192.168开头的内容,您将通过一次检查将其丢弃。

The only question remaining is to write the function for grouping the netsets with optimal configuraton. 剩下的唯一问题是编写用于对具有最佳配置的网络进行分组的功能。

I mock a ip list and a subnet dict as input: 我模拟了一个ip列表和一个子网dict作为输入:

 # ip address list ip_list = [ '192.168.1.151', '192.168.10.191', '192.168.6.127', '192.168.2.227', '192.168.2.5', '192.168.3.237', '192.168.6.188', '192.168.7.209', '192.168.9.10' ] # subnet dict netsets = { '192.168.1.0/24': 'subnet-A', # {subnet: subnet's name} '192.168.10.0/24': 'subnet-B', '192.168.2.0/24':'subnet-C', '192.168.3.0/24': 'subnet-C', '192.168.6.0/24': 'subnet-D', '192.168.7.0/24': 'subnet-D', '192.168.9.0/24':'subnet-E', } 

and then each ip address in ip_list need to find the name of subnet. 然后ip_list中的每个ip地址都需要找到子网的名称。

We assume that each ip address can find the corresponding subnet in netsets. 我们假设每个IP地址都可以在netsets中找到相应的子网。

Ouput like this: 像这样的输出:

 192.168.1.151 subnet-A 192.168.10.191 subnet-B 192.168.6.127 subnet-D 192.168.2.227 subnet-C 192.168.2.5 subnet-C 192.168.3.237 subnet-C 192.168.6.188 subnet-D 192.168.7.209 subnet-D 192.168.9.10 subnet-E 

[...] Is there any better to solve this problem? [...]有没有更好的解决这个问题?

Here's a two liner that does it: 这是一个两个班轮,它做到了:

for ip_addr in ip_list:
    print "{0}\t{1}".format(ip_addr,netsets[".".join(ip_addr.split('.')[0:-1])+".0/24"])

Assuming that subnets don't overlap each other you could convert the subnet to two integers, beginning and end of the range. 假设子网彼此不重叠,您可以将子网转换为两个整数,即范围的开头和结尾。 These numbers would be added to a list which would be sorted. 这些数字将被添加到将被排序的列表中。 While doing this we'd need to build a dictionary which could be used later to retrieve the subnet name with start of the range. 在执行此操作时,我们需要构建一个字典,以后可以使用该字典来检索具有范围开头的子网名称。

def to_int(ip):
    parts = map(int, ip.split('.'))

    return parts[0] << 24 | parts[1] << 16 | parts[2] << 8 | parts[3]

def build(netsets):
    ranges = []
    subnets = {}

    for net, name in netsets.iteritems():
        ip, size = net.split('/')
        start = to_int(ip)
        end = start | 0xffffffff >> int(size)
        ranges.extend([start, end])
        subnets[start] = name

    ranges.sort()
return ranges, subnets

When searching for an IP you'd turn it to number again and do bisect_left on the list or ranges. 搜索IP时,您将其再次转为数字,并在列表或范围上执行bisect_left If result is uneven number or the IP matches on any number on the list then the IP is within a subnet. 如果结果是数字不均匀或IP匹配列表中的任何数字,则IP在子网内。 Then you'd use the star of the range to get the name of the subnet from a dictionary that was built earlier: 然后,您将使用范围中的星号从先前构建的字典中获取子网的名称:

def find(ranges, subnets, ip):
    num = to_int(ip)
    pos = bisect.bisect_left(ranges, to_int(ip))

    # Check if first IP in the range
    if pos % 2 == 0 and ranges[pos] == num:
        pos += 1

    if pos % 2:
        return subnets[ranges[pos - 1]]
    else:
        return None

With the previous building block one could easily get the subnet for each IP with following code: 使用上一个构建块,可以使用以下代码轻松获取每个IP的子网:

ranges, subnets = build(netsets)
for ip in ip_list:
    print 'ip: {0}, subnet: {1}'.format(ip, find(ranges, subnets, ip))

Building the dictionary and the range list would take O(m log m) time and going through the IP list would take O(n log m) where m is the number of subnets and n the number of IPs. 构建字典和范围列表将花费O(m log m)时间,并且通过IP列表将花费O(n log m),其中m是子网的数量,n是IP的数量。 Solution works with different subnets of different size and will print None in case the IP doesn't belong to any subnet. 解决方案适用于不同大小的不同子网,如果IP不属于任何子网,则打印None

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM