Regex to capture '/etc/services'

Question

I want to capture some info from the \\etc\\services file on my UNIX machine, But I capture the wrong value, while also overcomplicating it I think.

What I have now

with open('/etc/services') as ports_file:
    lines = ports_file.readlines()
    for line in lines:
        print re.findall('((\w*\-*\w+)+\W+(\d+)\/(tcp|udp))', line)

But it is yielding incorrect values like this:

[('dircproxy\t57000/tcp', 'dircproxy', '57000', 'tcp')]
[('tfido\t\t60177/tcp', 'tfido', '60177', 'tcp')]
[('fido\t\t60179/tcp', 'fido', '60179', 'tcp')]

I would want it like this:

[('dircproxy', '57000', 'tcp')]
[('tfido', '60177', 'tcp')]
[('fido', '60179', 'tcp')]

I think this (\\w*\\-*\\w+)+ is needed in my regex because some are defined like this this-should-capture

Answer 1

I'd suggest coming at this from a different perspective: Instead of matching the field values, match the separators between them.

print re.split(r'[\s/]+', line.split('#', 1)[0])[:3]

The first line.split('#', 1)[0] removes comments (anything after the first # in the file).

Answer 2

It personally wouldn't use regex here. Look at the solution below and try to see if it fits your needs (also note that you can iterate over the file object directly):

services = []
with open('/etc/services') as serv:
    for line in serv:
        l = line.split()
        if len(l) < 2:
            continue
        if '/tcp' in l[1] or '/udp' in l[1]:
            port, protocol = l[1].split('/')
            services.append((l[0], port, protocol))

Regex to capture '/etc/services'

Question

2 answers

solution1
1 ACCPTED 2017-10-12 19:57:59

solution2
0 2017-10-12 20:06:05

Regex to capture '/etc/services'

Question

2 answers

solution1 1 ACCPTED 2017-10-12 19:57:59

solution2 0 2017-10-12 20:06:05

solution1
1 ACCPTED 2017-10-12 19:57:59

solution2
0 2017-10-12 20:06:05