简体   繁体   中英

Combine regular expression in Python

I'm newbie in regular expression.

I'm trying to get list of service that are up or down in svstat command.

Example output of svstat:

/etc/service/worker-test-1: up (pid 1234) 97381 seconds
/etc/service/worker-test-2: up (pid 4567) 92233 seconds
/etc/service/worker-test-3: up (pid 8910) 97381 seconds
/etc/service/worker-test-4: down 9 seconds, normally up
/etc/service/worker-test-5: down 9 seconds, normally up
/etc/service/worker-test-6: down 9 seconds, normally up

So, currently I need 2 regex to filter service that are UP, or DOWN

Sample regex-1 for UP:

/etc/service/(?P<service_name>.+):\s(?P<status>up|down)\s\(pid\s(?P<pid>\d+)\)\s(?P<seconds>\d+)

Output for regex-1:

Match 1
status -> up
service_name -> worker-test-1
pid -> 1234
seconds -> 97381

Match 2
status -> up
service_name -> worker-test-2
pid -> 4567
seconds -> 92233

Match 3
status -> up
service_name -> worker-test-3
pid -> 8910
seconds -> 97381

Sample regex-2 for DOWN

/etc/service/(?P<service_name>.+):\s(?P<status>up|down)\s(?P<seconds>\d+)

Output for regex-2

Match 1
status -> down
service_name -> worker-test-4
seconds -> 9

Match 2
status -> down
service_name -> worker-test-5
seconds -> 9

Match 3
status -> down
service_name -> worker-test-6
seconds -> 9

Question is, how to use only 1 regex to get both UP and DOWN?

By the way, Im using http://pythex.org/ to create and test these regex.

You could enclose pid to optional non-capturing group:

/etc/service/(?P<service_name>.+):\s(?P<status>up|down)(?:\s\(pid\s(?P<pid>\d+)\))?\s(?P<seconds>\d+)

This would result pid being None in case service is down. See Regex101 demo.

As promised here my lunchbreak alternative (do not want to talk into fixed token split parsing, but might come in handy when considering the rest of the use case that only the OP knows ;-)

#! /usr/bin/env python
from __future__ import print_function

d = """
/etc/service/worker-test-1: up (pid 1234) 97381 seconds
/etc/service/worker-test-2: up (pid 4567) 92233 seconds
/etc/service/worker-test-3: up (pid 8910) 97381 seconds
/etc/service/worker-test-4: down 9 seconds, normally up
/etc/service/worker-test-5: down 9 seconds, normally up
/etc/service/worker-test-6: down 9 seconds, normally up
"""


def service_state_parser_gen(text_lines):
    """Parse the lines from service monitor by splitting
    on well known binary condition (either up or down)
    and parse the rest of the fields based on fixed
    position split on sanitized data (in the up case).
    yield tuple of key and dictionary as result or of
    None, None when neihter up nor down detected."""

    token_up = ': up '
    token_down = ': down '
    path_sep = '/'

    for line in d.split('\n'):
        if token_up in line:
            chunks = line.split(token_up)
            status = token_up.strip(': ')
            service = chunks[0].split(path_sep)[-1]
            _, pid, seconds, _ = chunks[1].replace(
                '(', '').replace(')', '').split()
            yield service, {'name': service,
                            'status': status,
                            'pid': int(pid),
                            'seconds': int(seconds)}
        elif token_down in line:
            chunks = line.split(token_down)
            status = token_down.strip(': ')
            service = chunks[0].split(path_sep)[-1]
            pid = None
            seconds, _, _, _ = chunks[1].split()
            yield service, {'name': service,
                            'status': status,
                            'pid': None,
                            'seconds': int(seconds)}
        else:
            yield None, None


def main():
    """Sample driver for parser generator function."""

    services = {}
    for key, status_map in service_state_parser_gen(d):
        if key is None:
            print("Non-Status line ignored.")
        else:
            services[key] = status_map

    print(services)

if __name__ == '__main__':
    main()

When being run it yields as result on the given sample input:

Non-Status line ignored.
Non-Status line ignored.
{'worker-test-1': {'status': 'up', 'seconds': 97381, 'pid': 1234, 'name': 'worker-test-1'}, 'worker-test-3': {'status': 'up', 'seconds': 97381, 'pid': 8910, 'name': 'worker-test-3'}, 'worker-test-2': {'status': 'up', 'seconds': 92233, 'pid': 4567, 'name': 'worker-test-2'}, 'worker-test-5': {'status': 'down', 'seconds': 9, 'pid': None, 'name': 'worker-test-5'}, 'worker-test-4': {'status': 'down', 'seconds': 9, 'pid': None, 'name': 'worker-test-4'}, 'worker-test-6': {'status': 'down', 'seconds': 9, 'pid': None, 'name': 'worker-test-6'}}

So the otherwise in named group matches stored info is stored (already type converted as values under matching keys in a dict. If a service is down, there is of course no process id, thus pid is mapped to None which makes it easy to code in a robust manner against it (if one would store all down services in a separate structure that would be implicit, that no access to a pid field is advisable ...

Hope it helps. PS: Yes, the argument name text_lines of the showcase function is not optimally named, for what it contains, but you should get the parsing idea.

I don't know if you are forced to use regex at all but if you don't have to, you can do something like this:

if "down" in linetext:
    print( "is down" )
else:
    print( "is up" )

Easier to read and faster as well.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM