简体   繁体   English

从 ifconfig (ubuntu) 中提取网络信息的算法

[英]Algorithm to extract network info from ifconfig (ubuntu)

Im trying to parse info from ifconfig (ubuntu).我试图从 ifconfig (ubuntu) 解析信息。 Normally, I would split a chunk of data like this down into words, and then search for substrings to get what I want.通常,我会将这样的一大块数据拆分为单词,然后搜索子字符串以获得我想要的。 For example, given line = "inet addr:192.168.98.157 Bcast:192.168.98.255 Mask:255.255.255.0" , and looking for the broadcast address, I would do:例如,给定line = "inet addr:192.168.98.157 Bcast:192.168.98.255 Mask:255.255.255.0" ,并寻找广播地址,我会这样做:

for word in line.split():
    if word.startswith('Bcast'):
        print word.split(':')[-1]

>>>192.168.98.255

However, I feel its about time to start learning how to use regular expressions for tasks like this.但是,我觉得是时候开始学习如何将正则表达式用于此类任务了。 Here is my code so far.到目前为止,这是我的代码。 I've hacked through a couple of patterns (inet addr, Bcast, Mask).我已经破解了几种模式(inet addr、Bcast、Mask)。 Questions after code...代码后的问题...

# git clone git://gist.github.com/1586034.git gist-1586034
import re
import json

ifconfig = """
eth0      Link encap:Ethernet  HWaddr 08:00:27:3a:ab:47  
          inet addr:192.168.98.157  Bcast:192.168.98.255  Mask:255.255.255.0
          inet6 addr: fe80::a00:27ff:fe3a:ab47/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:189059 errors:0 dropped:0 overruns:0 frame:0
          TX packets:104380 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:74213981 (74.2 MB)  TX bytes:15350131 (15.3 MB)\n\n
lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:389611 errors:0 dropped:0 overruns:0 frame:0
          TX packets:389611 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:81962238 (81.9 MB)  TX bytes:81962238 (81.9 MB)
"""

for paragraph in ifconfig.split('\n\n'):
        
    info = {
        'eth_port': '',
        'ip_address': '',
        'broadcast_address': '',
        'mac_address': '',
        'net_mask': '',
        'up': False,
        'running': False,
        'broadcast': False,
        'multicast': False,
    }
    
    if 'BROADCAST' in paragraph:
        info['broadcast'] = True
        
    if 'MULTICAST' in paragraph:
        info['multicast'] = True
        
    if 'UP' in paragraph:
        info['up'] = True
        
    if 'RUNNING' in paragraph:
        info['running'] = True
        
    ip = re.search( r'inet addr:[^\s]+', paragraph )
    if ip:
        info['ip_address'] = ip.group().split(':')[-1]  
    
    bcast = re.search( r'Bcast:[^\s]+', paragraph )
    if bcast:
        info['broadcast_address'] = bcast.group().split(':')[-1]
    
    mask = re.search( r'Mask:[^\s]+', paragraph )
    if mask:
        info['net_mask'] = mask.group().split(':')[-1]

    print paragraph
    print json.dumps(info, indent=4)

Here're my questions:这是我的问题:

  1. Am I taking the best approach for the patterns I have already implemented?对于已经实施的模式,我是否采用了最佳方法? Can I grab the addresses without splitting on ':' and then choosing the last of the array.?我可以在不拆分 ':' 然后选择数组的最后一个的情况下获取地址吗?

  2. I'm stuck on HWaddr.我被困在 HWaddr 上。 What would be a pattern to match this mac address?匹配这个mac地址的模式是什么?

EDIT:编辑:

Ok, so here's how I ended up going about this.好的,这就是我最终的处理方式。 I started out trying to go about this without the regex... just manipulating stings and lists.我开始尝试在没有正则表达式的情况下解决这个问题……只是操纵刺痛和列表。 But that proved to be a nightmare.但事实证明这是一场噩梦。 For example, what separates HWaddr from its address is a space .例如,将HWaddr与其地址分开的是一个space Now take inet addr its separated from its address by : .现在将inet addr与它的地址分开: Its a tough problem to scrape with differing separators like this.像这样用不同的分离器刮擦是一个棘手的问题。 Not only a problem to code but also a problem to read.不仅是编码的问题,也是阅读的问题。

So, I did this with regex.所以,我用正则表达式做了这个。 I think this makes a strong case for when to use regular expressions.我认为这为何时使用正则表达式提供了强有力的理由。

# git clone git://gist.github.com/1586034.git gist-1586034

# USAGE: pipe ifconfig into script. ie "ifconfig | python pyifconfig.py"
# output is a list of json datastructures

import sys
import re
import json

ifconfig = sys.stdin.read()

print 'STARTINPUT'
print ifconfig
print 'ENDINPUT'

def extract(input):
    mo = re.search(r'^(?P<interface>eth\d+|eth\d+:\d+)\s+' +
                     r'Link encap:(?P<link_encap>\S+)\s+' +
                     r'(HWaddr\s+(?P<hardware_address>\S+))?' +
                     r'(\s+inet addr:(?P<ip_address>\S+))?' +
                     r'(\s+Bcast:(?P<broadcast_address>\S+)\s+)?' +
                     r'(Mask:(?P<net_mask>\S+)\s+)?',
                     input, re.MULTILINE )
    if mo:
        info = mo.groupdict('')
        info['running'] = False
        info['up'] = False
        info['multicast'] = False
        info['broadcast'] = False
        if 'RUNNING' in input:
            info['running'] = True
        if 'UP' in input:
            info['up'] = True
        if 'BROADCAST' in input:
            info['broadcast'] = True
        if 'MULTICAST' in input:
            info['multicast'] = True
        return info
    return {}


interfaces = [ extract(interface) for interface in ifconfig.split('\n\n') if interface.strip() ]
print json.dumps(interfaces, indent=4)

Rather than reinventing the wheel:而不是重新发明轮子:

Or if you want a portable-ish version that works on multiple platforms..或者,如果您想要一个可在多个平台上运行的便携式版本。

Am I taking the best approach for the patterns I have already implemented?对于已经实施的模式,我是否采用了最佳方法? Can I grab the addresses without splitting on ':' and then choosing the last of the array.?我可以在不拆分 ':' 然后选择数组的最后一个的情况下获取地址吗?

Your patterns are fine for what they are doing, although [^\\s] is equivalent to \\S .尽管[^\\s]等效于\\S ,但您的模式对于他们正在做的事情来说很好。

You can grab the addresses without splitting on ':' by putting the address into a capturing group, like this:您可以通过将地址放入捕获组来获取地址而无需在 ':' 上拆分,如下所示:

    ip = re.search(r'inet addr:(\S+)', paragraph)
    if ip:
        info['ip_address'] = ip.group(1)

If you had more grouped portions of the regex you could refer to them by the order they appear in your regex, starting at 1.如果您有更多的正则表达式分组部分,您可以按照它们在正则表达式中出现的顺序(从 1 开始)来引用它们。

I'm stuck on HWaddr.我被困在 HWaddr 上。 What would be a pattern to match this mac address?匹配这个mac地址的模式是什么?

Now that you know about grouping, you can get HWaddr the same way as the other addresses:现在您了解了分组,您可以像其他地址一样获取 HWaddr:

    mac = re.search(r'HWaddr\s+(\S+)', paragraph)
    if mac:
        info['mac_address'] = mac.group(1)

Note that with a more advanced regular expression you could actually do several of these steps all at once.请注意,使用更高级的正则表达式,您实际上可以同时执行多个步骤。 For example here is an example regex that pulls out the interface name, ip address, and net mask in one step:例如,这里是一个示例正则表达式,它可以一步提取接口名称、IP 地址和网络掩码:

>>> re.findall(r'^(\S+).*?inet addr:(\S+).*?Mask:(\S+)', ifconfig, re.S | re.M)
[('eth0', '192.168.98.157', '255.255.255.0'), ('lo', '127.0.0.1', '255.0.0.0')]

With new 3.0.0 version of psutil ( https://github.com/giampaolo/psutil ) you can avoid parsing ifconfig output and do this directly in Python: http://pythonhosted.org/psutil/#psutil.net_if_addrs使用新的 3.0.0 版 psutil ( https://github.com/giampaolo/psutil ),您可以避免解析 ifconfig 输出并直接在 Python 中执行此操作: http : //pythonhosted.org/psutil/#psutil.net_if_addrs

>>> import psutil
>>> psutil.net_if_addrs()
{'lo': [snic(family=<AddressFamily.AF_INET: 2>, address='127.0.0.1', netmask='255.0.0.0', broadcast='127.0.0.1'),
        snic(family=<AddressFamily.AF_INET6: 10>, address='::1', netmask='ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff', broadcast=None),
        snic(family=<AddressFamily.AF_LINK: 17>, address='00:00:00:00:00:00', netmask=None, broadcast='00:00:00:00:00:00')],
 'wlan0': [snic(family=<AddressFamily.AF_INET: 2>, address='192.168.1.3', netmask='255.255.255.0', broadcast='192.168.1.255'),
           snic(family=<AddressFamily.AF_INET6: 10>, address='fe80::c685:8ff:fe45:641%wlan0', netmask='ffff:ffff:ffff:ffff::', broadcast=None),
           snic(family=<AddressFamily.AF_LINK: 17>, address='c4:85:08:45:06:41', netmask=None, broadcast='ff:ff:ff:ff:ff:ff')]}
>>>

To be honest, regular expressions are not particularly better than simple string manipulation;老实说,正则表达式并没有特别好于简单的字符串操作; if anything, they're always slower.如果有的话,他们总是更慢。

This said, you should start cleaning your input with a better split:这就是说,你应该开始用更好的分割来清理你的输入:

lines = [line.strip() for line in ifconfig.split("\n") if line.strip() != '']

This removes all whitespace around the lines, and discards empty ones;这将删除行周围的所有空白,并丢弃空的; your regexes can now start with ^ and end with $ , which will reduce the possibility of false positives.您的正则表达式现在可以以^开头并以$结尾,这将减少误报的可能性。

Then you'd really have to look at grouping;那么你真的必须考虑分组; the patterns you're using are just glorified startswith , and certainly less optimized than startswith will ever be.您正在使用的模式只是美化了startswith ,而且肯定不如startswith优化。 A regex guru will come up with better, but for example a simple pattern for the HWAddr line would be正则表达式大师会想出更好的方法,但例如 HWAddr 行的简单模式是

>>> m = re.match(r'^([A-z]*\d)\s+(Link)\s+(encap):([A-z]*)\s+(HWaddr)\s+([A-z0-9:]*)$',lines[0])
>>> m.groups()
('eth0', 'Link', 'encap', 'Ethernet', 'HWaddr', '08:00:27:3a:ab:47')

But really, the more I look at it, the more the simpler approach based on split() and split(':') makes sense for such a rigidly formatted input.但实际上,我看的越多,基于 split() 和 split(':') 的更简单的方法对于这种格式严格的输入就越有意义。 Regexes make your code less readable, and are very expensive.正则表达式使您的代码可读性降低,并且非常昂贵。 As JWZ once said, "Some people, when confronted with a problem, think 'I know, I'll use regular expressions.'正如JWZ曾经说过的那样, “有些人在遇到问题时会想‘我知道,我会使用正则表达式’。” Now they have two problems."现在他们有两个问题。”

Try something along:尝试一些东西:

>>> import re  
>>> m = re.search(r'^(?P<interface>eth\d+|eth\d+:\d+|lo|ppp\d+)\s+' +
...              r'Link encap:(?P<link_encap>\S+)\s+' +
...              r'HWaddr\s(?P<hardware_address>[0-9a-f]{2}:[0-9a-f]{2}:' +
...              r'[0-9a-f]{2}:[0-9a-f]{2}:[0-9a-f]{2}:[0-9a-f]{2})\s', 
...              ifconfig,
...              re.MULTILINE
...    )
>>> m.groupdict()
{'hardware_address': '08:00:27:3a:ab:47',
 'interface': 'eth0',
 'link_encap': 'Ethernet'}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM