简体   繁体   中英

python3 arp-scan and mac parsing

I'm trying to parse the mac addresses from arp-scan output. There's an example:

import re
from subprocess import Popen, PIPE

def get_active_hosts():
    with Popen(['sudo', 'arp-scan', '-l', '-r', '5'], stdout = PIPE) as proc:
        mac_list = re.compile('\s+(([0-9A-Fa-f]{2}:){5}([0-9A-Fa-f]){2})\s+')
        mac_list = mac_list.findall(proc.stdout.read().decode('utf-8'))
    return mac_list
print(get_active_hosts())

But I got this output:

[('4a:c3:26:0e:85:d0', '85:', '0')]

What's going on ? How to capture only mac addresses without this trash:

[('85:', '0')]

Thanks for advice.

findall is returning all of the matching groups that it found. Groups are declared using a set of parentheses. Your regular expression contains three groups as follows:

(([0-9A-Fa-f]{2}:){5}([0-9A-Fa-f]){2})
([0-9A-Fa-f]{2}:)
([0-9A-Fa-f])

So now hopefully you understand why findall gives you three matches, and why they look like they do.

The solution here is to declare these extra groups (the ones you don't want) to be non-capturing by putting ?: after the opening parenthesis as follows:

mac_list = re.compile('\s+((?:[0-9A-Fa-f]{2}:){5}(?:[0-9A-Fa-f]){2})\s+')

Let's look at the documentation on the findall method:

re.findall(pattern, string, flags=0)

Return all non-overlapping matches of pattern in string, as a list of strings. The string is scanned left-to-right, and matches are returned in the order found. If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group. Empty matches are included in the result.

Changed in version 3.7: Non-empty matches can now start just after a previous empty match.

Pay attention to the bold text. You have more than one groups in the pattern:

  • (([0-9A-Fa-f]{2}:){5}([0-9A-Fa-f]) => '4a:c3:26:0e:85:d0'
  • ([0-9A-Fa-f]{2}:) => '85:'
  • ([0-9A-Fa-f]) => '0'

And as documentation said you get a list of tuple with captured groups.

To get only full mac address you need specify non-capturing parenthesis into regexp. The re module documentation says:

(?:...) A non-capturing version of regular parentheses. Matches whatever regular expression is inside the parentheses, but the substring matched by the group cannot be retrieved after performing a match or referenced later in the pattern.

So, fix all non-main parenthesis (which not capture the entire mac address).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM