给定两个文件（IP和子网信息），创建将每个IP与子网关联的文件

Question

I've been struggling for a couple of days with the proper way to address this solution, and I am seeking some assistance. 我一直在努力寻找解决此问题的正确方法几天，我正在寻求帮助。

I have two files and need to create a third that shows the relationship. 我有两个文件，需要创建第三个文件来显示关系。

IP Address file - ip.csv IP地址文件-ip.csv
Subnet file - subnet.csv 子网文件-subnet.csv

I need to specify what subnet that each IP is in, and create a third file 我需要指定每个IP所在的子网，并创建第三个文件

The ip.csv file will contain about 1.5 million IP's and the subnet.csv file will contain around 140,000 subnets. ip.csv文件将包含大约150万个IP，subnet.csv文件将包含大约140,000个子网。

ip.csv file sample: ip.csv文件示例：

IP,Type
10.78.175.167,IPv4
10.20.3.56,IPv4

subnet.csv file sample: subnet.csv文件样本：

Subnet,Netmask
10.176.122.136/30,255.255.255.252
10.20.3.0/24,255.255.254.0

Format of file I need to create: 我需要创建的文件格式：

Subnet,IP
10.20.3.0/24,10.20.3.56

I've tried to make use of things from these pages: 我试图利用这些页面中的内容：

subnettree module 子网树模块
ipaddress module ipaddress模块
random page with cidr help cidr帮助的随机页面
IP range help IP范围帮助

This is the code that I have tried. 这是我尝试过的代码。 It works on small sets, but I'm having problems running it with the full set of files. 它适用于小型集合，但是我无法在完整文件集上运行它。

#!/usr/local/bin/python2.7
import csv
import ipaddress
import iptools
import re
import SubnetTree
import sys
from socket import inet_aton

testdir = '/home/test/testdir/'
iprelfile = testdir + 'relationship.csv'
testipsub = testdir + 'subnet.csv'
testipaddr = testdir + 'ip.csv'

o1 = open (iprelfile, "a")

# Subnet file
IPR = set()
o1.write('Subnet,IP\n')
with open(testipsub, 'rb') as master:
    reader = csv.reader(master)
    for row in reader:
        if 'Subnet' not in row[0]:
            # Convert string to unicode to be parsed with ipaddress module
            b = unicode(row[1])
            # Using ipaddress module to create list containing every IP in subnet
            n2 = ipaddress.ip_network(b)
            b1 = (list(n2.hosts()))
            # IP address file
            with open(testipaddr, 'rb') as ipaddy:
                readera = csv.reader(ipaddy)
                for rowa in readera:
                    if 'IP' not in rowa[0]:
                        bb = rowa[0]
                        for ij in b1:
                            # Convert to string for comparison
                            f = str(ij)
                            # If the IP address is in subnet range
                            if f == bb:
                                IPR.update([row[0] + ',' + bb + '\n'])


for ip in IPR:
    o1.write(ip + '\n')

# Closing the file
o1.close()

Answer 1

You could read all the subnets to memory and sort them by network address. 您可以将所有子网读取到内存中，然后按网络地址对其进行排序。 This would allow you to use bisect to do a binary search in order to find the subnet for every IP. 这将允许您使用bisect进行二进制搜索，以便找到每个IP的子网。 This only works if the subnets don't overlap each other, if they do you'll probably need to use segment tree . 这仅在子网彼此不重叠时才有效，如果子网重叠，则可能需要使用线段树。

import bisect
import csv
import ipaddress

def sanitize(ip):
    parts = ip.split('/', 1)
    parts[0] = '.'.join(str(int(x)) for x in parts[0].split('.'))

    return '/'.join(parts)

with open('subnet.csv') as subnet_f:
    reader = csv.reader(subnet_f)
    next(reader)    # Skip column names

    # Create list of subnets sorted by network address and
    # list of network addresses in the same order
    subnets = sorted((ipaddress.IPv4Network(sanitize(row[0])) for row in reader),
                     key=lambda x: x.network_address)
    network_addrs = [subnet.network_address for subnet in subnets]

with open('ip.csv') as ip_f, open('output.csv', 'w', newline='') as out_f:
    reader = csv.reader(ip_f)
    next(reader)

    writer = csv.writer(out_f)
    writer.writerow(['Subnet', 'IP'])

    for row in reader:
        ip = ipaddress.IPv4Address(sanitize(row[0]))
        index = bisect.bisect(network_addrs, ip) - 1

        if index < 0 or subnets[index].broadcast_address < ip:
            continue    # IP not in range of any networks
        writer.writerow([subnets[index], ip])

Output: 输出：

Subnet,IP
10.20.3.0/24,10.20.3.56

Above has time complexity of O(n log m) where n is the number of IPs and m number of networks. 上面的时间复杂度为O（n log m） ，其中n是IP数量，m是网络数量。 Note that it only runs with Python 3 since ipaddress is not included to Python 2.7. 请注意，由于ipaddress不包含在Python 2.7中，因此它仅与Python 3一起运行。 If you need to use Python 2.7 there are backports available. 如果您需要使用Python 2.7，则可以使用反向移植。

Update The first goal for efficient solution is to find a way to process each individual IP in efficient manner. 更新高效解决方案的首要目标是找到一种以高效方式处理每个IP的方法。 Looping through all subnets is terribly expensive so it won't do. 遍历所有子网非常昂贵，因此不会这样做。 It's much better to create a sorted list of first IP in each subnet. 在每个子网中创建第一个IP的排序列表要好得多。 For given data it would look like this: 对于给定的数据，它看起来像这样：

[IPv4Address('10.20.3.0'), IPv4Address('10.176.122.136')]

This will allow us to execute binary search in order to find index of IP address that is equal or lower than individual IP. 这将使我们能够执行二进制搜索，以查找等于或低于单个IP的IP地址索引。 For example when we search for IP 10.20.3.56 we use bisect.bisect to provide us the first index greater than IP and decrement it by one: 例如，当我们搜索IP 10.20.3.56时，我们使用bisect.bisect为我们提供大于IP的第一个索引，并将其减一：

>>> l = [IPv4Address('10.20.3.0'), IPv4Address('10.176.122.136')]
>>> index = bisect.bisect(l, IPv4Address('10.20.3.56'))
>>> index
1
>>> l[index - 1]
IPv4Address('10.20.3.0')

Since we have stored the networks to another list which is in the same order we can use index to retrieve given subnet. 由于我们已将网络存储到另一个列表中，并且顺序相同，因此可以使用索引来检索给定的子网。 Once we have the subnet we still need to check that the individual IP is equal or lower than the last IP within the subnet. 拥有子网后，我们仍然需要检查单个IP是否等于或小于子网中的最后一个IP。 If the individual IP is within the subnet then write a row to result, if not move to next IP. 如果单个IP在子网内，则写一行以得出结果，如果不移到下一个IP。

给定两个文件（IP和子网信息），创建将每个IP与子网关联的文件

问题描述

1 个解决方案

解决方案1
0 已采纳 2017-02-04 02:50:16

给定两个文件（IP和子网信息），创建将每个IP与子网关联的文件

问题描述

1 个解决方案

解决方案1 0 已采纳 2017-02-04 02:50:16

解决方案1
0 已采纳 2017-02-04 02:50:16