简体   繁体   English

python3 arp-scan和mac解析

[英]python3 arp-scan and mac parsing

I'm trying to parse the mac addresses from arp-scan output. 我正在尝试从arp-scan输出解析mac地址。 There's an example: 有一个例子:

import re
from subprocess import Popen, PIPE

def get_active_hosts():
    with Popen(['sudo', 'arp-scan', '-l', '-r', '5'], stdout = PIPE) as proc:
        mac_list = re.compile('\s+(([0-9A-Fa-f]{2}:){5}([0-9A-Fa-f]){2})\s+')
        mac_list = mac_list.findall(proc.stdout.read().decode('utf-8'))
    return mac_list
print(get_active_hosts())

But I got this output: 但是我得到了以下输出:

[('4a:c3:26:0e:85:d0', '85:', '0')]

What's going on ? 这是怎么回事 ? How to capture only mac addresses without this trash: 如何仅捕获没有此垃圾的mac地址:

[('85:', '0')] [('85:','0')]

Thanks for advice. 谢谢你的建议。

findall is returning all of the matching groups that it found. findall返回所找到的所有匹配组。 Groups are declared using a set of parentheses. 组使用一组括号声明。 Your regular expression contains three groups as follows: 您的正则表达式包含三组,如下所示:

(([0-9A-Fa-f]{2}:){5}([0-9A-Fa-f]){2})
([0-9A-Fa-f]{2}:)
([0-9A-Fa-f])

So now hopefully you understand why findall gives you three matches, and why they look like they do. 所以现在希望您理解了findall为什么给您三场比赛,以及为什么他们看起来像他们。

The solution here is to declare these extra groups (the ones you don't want) to be non-capturing by putting ?: after the opening parenthesis as follows: 此处的解决方案是通过在左括号后放置?:来声明这些额外的组(您不希望的组)不被捕获 ,如下所示:

mac_list = re.compile('\s+((?:[0-9A-Fa-f]{2}:){5}(?:[0-9A-Fa-f]){2})\s+')

Let's look at the documentation on the findall method: 让我们看一下findall方法的文档:

re.findall(pattern, string, flags=0) re.findall(模式,字符串,标志= 0)

Return all non-overlapping matches of pattern in string, as a list of strings. 返回字符串中模式的所有非重叠匹配项,作为字符串列表。 The string is scanned left-to-right, and matches are returned in the order found. 从左到右扫描该字符串,并以找到的顺序返回匹配项。 If one or more groups are present in the pattern, return a list of groups; 如果该模式中存在一个或多个组,则返回一个组列表;否则,返回一个列表。 this will be a list of tuples if the pattern has more than one group. 如果模式包含多个组,则这将是一个元组列表。 Empty matches are included in the result. 空匹配项包含在结果中。

Changed in version 3.7: Non-empty matches can now start just after a previous empty match. 在版本3.7中更改:现在可以在上一个空匹配之后立即开始非空匹配。

Pay attention to the bold text. 注意粗体文本。 You have more than one groups in the pattern: 模式中有多个组:

  • (([0-9A-Fa-f]{2}:){5}([0-9A-Fa-f]) => '4a:c3:26:0e:85:d0' (([[0-9A-Fa-f] {2}:){5}([0-9A-Fa-f])=>'4a:c3:26:0e:85:d0'
  • ([0-9A-Fa-f]{2}:) => '85:' ([0-9A-Fa-f] {2} :) => '85:'
  • ([0-9A-Fa-f]) => '0' ([0-9A-Fa-f])=>'0'

And as documentation said you get a list of tuple with captured groups. 正如文档所述,您将获得包含已捕获组的元组列表。

To get only full mac address you need specify non-capturing parenthesis into regexp. 要仅获取完整的mac地址,您需要在正则表达式中指定非捕获括号。 The re module documentation says: 重新模块文档说:

(?:...) A non-capturing version of regular parentheses. (?:...)正则括号的非捕获版本。 Matches whatever regular expression is inside the parentheses, but the substring matched by the group cannot be retrieved after performing a match or referenced later in the pattern. 匹配括号内的任何正则表达式,但是在执行匹配后或在模式中稍后引用后,无法检索到与该组匹配的子字符串。

So, fix all non-main parenthesis (which not capture the entire mac address). 因此,请修复所有非主要的括号(不能捕获整个mac地址)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM