简体   繁体   中英

Python - how to match specific words / digits from multiple lines in a text file and store them in separate lists

I have a .txt file, and it's like this one:

ip sla logging traps
ip sla 2553
ethernet jitter mpid 6553 domain Gravity vlan 2553 num-frames 100 interval 100
tag CSCO5839
frequency 300
ip sla schedule 2553 life forever start-time now
ip sla 3553
ethernet jitter mpid 7553 domain Gravity vlan 3553 num-frames 100 interval 100
tag CSCO5839
frequency 300
ip sla schedule 3553 life forever start-time now

And I would like a method to match the ip sla / mpid / vlan and tag and store their values into a list or a vector:

Desired output:

ipsla[0]=2553 and ipsla[1]=3553
mpid[0]=6553 and mpid[1]=7553
vlan[0]=2553 and vlan[1]=3553
tag[0]=CSCO5839

Now, I have just started learning Python and I know that I have to parse each line of the file, then match the desired result using re.match() , and then store those obtained results in an array, or a list.

My poor code so far:

#!/usr/bin/env python

import re

myFile = open("regex.txt","r")

for line in myFile:
    if re.match("",line): #I should have a condition there but I got lost
        #now I have to store that variable in an array / or a list

My questions so far:
- what expression should I use so that I can find what I need ?
- How can I , then, store those values, each two in a vector / list with the same name ? eg: ipsla[0] , ipsla[1] ?

I would also like some explanations if it is possible.

Thank you.

Explanation of the regex you want and how it works

All of the above answers work, but if you're looking to really understand the problem I think this might be a good approach. First you need to think about what you're trying to find.

For the first three you want to match patterns that are like name space digits . The regex for this is name \\d+ the \\d means a digit character and the + means one or more.

So for ip sla you would want a regex like:

ip sla \d+

However you're only interested in number, so you can use parentheses to specify the number as its own group:

ip sla (\d+)

Finally, in python a backslash ( \\ ) is used to escape the character in front of it, so for python to recognize your backslash we'll need two of them, so in python code you would have:

pattern = "ip sla (\\d+)"

If you look at the docs , the re module has a function re.findall that makes it unnecessary to split your file into lines.

re.findall(pattern, string) will return a list of strings that match the pattern, or if there is only one group present (like in our situation where the one group is the \\d+ ) it will return that group.

That means that

re.findall("ip sla (\\d+)", fileText)

will return a list of digits (as strings) that are your ip sla values.

That can also be applied to mpid and vlan.

However, for tag you want to match letters and numbers. In regex these are known as word characters, and to match a word character we can use \\w . Once again we will use the + modifier to match one or more word characters. This leaves us with

pattern = "tag (\\w+)"

Strategy:

1. Read file into string

2. Construct and execute regex searches for each pattern

3. Iterate through each list of results

4. Append each result to the correct array

Code:

import re

myFile = open("regex.txt","r")  # open file for reading
myFileData = myFile.read()      # read file into a string
myFile.close()                  # close file now that we're done

# create lists for each thing you're looking for
ipsla = []
mpid = []
vlan = []
tag = []

# finds a pattern like "ip sla <one or more digits 0-9>"
results = re.findall("ip sla (\\d+)", myFileData)
for result in results:
        # add the number (as an int) to your ipsla list
        ipsla.append(int(result))

# rinse & repeat :)
results = re.findall("mpid (\\d+)", myFileData)
for result in results:
        mpid.append(int(result))

results = re.findall("vlan (\\d+)", myFileData)
for result in results:
        vlan.append(int(result))

results = re.findall("tag (\\w+)", myFileData)
for result in results:
        if result not in tag:
                tag.append(result)

print ipsla
print mpid
print vlan
print tag

Using regex with named groups:

import re
from collections import defaultdict

results = defaultdict(list)
matcher = re.compile(r'ip sla (?P<ipsla>\d+)|mpid (?P<mpid>\d+)'
                     r'|vlan (?P<vlan>\d+)|tag (?P<tag>\w+)')

with open("regex.txt") as f:
    for line in f:
        for match in re.finditer(matcher, line):
            results[match.lastgroup].append(match.group(match.lastgroup))

print(results)

Now your results are available in the ( defaultdict ) dictionary results:

defaultdict(<class 'list'>, {
    'mpid': ['6553', '7553'],
    'vlan': ['2553', '3553'],
    'ipsla': ['2553', '3553'],
    'tag': ['CSCO5839', 'CSCO5839']
})

You can get the individual lists into separate variables, if necessary, afterwards with:

ipsla = results['ipsla']
vlan = results['vlan']
mpid = results['mpid']
tag = results['tag']

you can try like this:

>>> ipsla, mpid, vlan, tag = [], [], [] ,[]
>>> my_string = open('your_file', 'r')
>>> for x in re.findall("ip sla \d+|mpid \d+|vlan \d+|tag \S+", my_string):
...     if x.startswith("ip sla"):
...         ipsla.append(x.split()[2])
...     if x.startswith("mpid"):
...         mpid.append(x.split()[1])
...     if x.startswith("vlan"):
...         vlan.append(x.split()[1])
...     if x.startswith("tag"):
...         tag.append(x.split()[1])
...
>>> ipsla
['2553', '3553']
>>> mpid
['6553', '7553']
>>> vlan
['2553', '3553']
>>> tag
['CSCO5839']

I suggest this code:

import re

sla = [] # Declare the list variables
mpid = []
vlan = []

with open("regex.txt","r") as myFile:              # Open the file
    p = re.compile(r'(sla|mpid|vlan)\s+(\d+)')     # Create regex
    for m in p.finditer(myFile.read()):            # Find the match in the current input
        if m.group(1) == 'sla':
            sla.append(m.group(2))                 # Add a match to the corresponding list
        if m.group(1) == 'mpid':
            mpid.append(m.group(2))
        if m.group(1) == 'vlan':
            vlan.append(m.group(2))

See the Python demo :

import re
s = """ip sla logging traps
ip sla 2553
ethernet jitter mpid 6553 domain Gravity vlan 2553 num-frames 100 interval 100
tag CSCO5839
frequency 300
ip sla schedule 2553 life forever start-time now
ip sla 3553
ethernet jitter mpid 7553 domain Gravity vlan 3553 num-frames 100 interval 100
tag CSCO5839
frequency 300
ip sla schedule 3553 life forever start-time now"""
sla = [] # Declare the list variables
mpid = []
vlan = []

p = re.compile(r'(sla|mpid|vlan)\s+(\d+)') # Create regex
for m in p.finditer(s):                  # Find the match in the current input
    if m.group(1) == 'sla':
        sla.append(m.group(2))              # Add a match to the corresponding list
    if m.group(1) == 'mpid':
        mpid.append(m.group(2))
    if m.group(1) == 'vlan':
        vlan.append(m.group(2))

print(sla)
print(mpid)
print(vlan)

Output:

['2553', '3553'] # sla
['6553', '7553'] # mpid
['2553', '3553'] # vlan

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM