简体   繁体   中英

Regex to capture '/etc/services'

I want to capture some info from the \\etc\\services file on my UNIX machine, But I capture the wrong value, while also overcomplicating it I think.

What I have now

with open('/etc/services') as ports_file:
    lines = ports_file.readlines()
    for line in lines:
        print re.findall('((\w*\-*\w+)+\W+(\d+)\/(tcp|udp))', line)

But it is yielding incorrect values like this:

[('dircproxy\t57000/tcp', 'dircproxy', '57000', 'tcp')]
[('tfido\t\t60177/tcp', 'tfido', '60177', 'tcp')]
[('fido\t\t60179/tcp', 'fido', '60179', 'tcp')]

I would want it like this:

[('dircproxy', '57000', 'tcp')]
[('tfido', '60177', 'tcp')]
[('fido', '60179', 'tcp')]

I think this (\\w*\\-*\\w+)+ is needed in my regex because some are defined like this this-should-capture

I'd suggest coming at this from a different perspective: Instead of matching the field values, match the separators between them.

print re.split(r'[\s/]+', line.split('#', 1)[0])[:3]

The first line.split('#', 1)[0] removes comments (anything after the first # in the file).

It personally wouldn't use regex here. Look at the solution below and try to see if it fits your needs (also note that you can iterate over the file object directly):

services = []
with open('/etc/services') as serv:
    for line in serv:
        l = line.split()
        if len(l) < 2:
            continue
        if '/tcp' in l[1] or '/udp' in l[1]:
            port, protocol = l[1].split('/')
            services.append((l[0], port, protocol))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM