I have a .txt file, and it's like this one:
ip sla logging traps
ip sla 2553
ethernet jitter mpid 6553 domain Gravity vlan 2553 num-frames 100 interval 100
tag CSCO5839
frequency 300
ip sla schedule 2553 life forever start-time now
ip sla 3553
ethernet jitter mpid 7553 domain Gravity vlan 3553 num-frames 100 interval 100
tag CSCO5839
frequency 300
ip sla schedule 3553 life forever start-time now
And I would like a method to match the ip sla / mpid / vlan and tag and store their values into a list or a vector:
Desired output:
ipsla[0]=2553
and ipsla[1]=3553
mpid[0]=6553
and mpid[1]=7553
vlan[0]=2553
and vlan[1]=3553
tag[0]=CSCO5839
Now, I have just started learning Python and I know that I have to parse each line of the file, then match the desired result using re.match()
, and then store those obtained results in an array, or a list.
My poor code so far:
#!/usr/bin/env python
import re
myFile = open("regex.txt","r")
for line in myFile:
if re.match("",line): #I should have a condition there but I got lost
#now I have to store that variable in an array / or a list
My questions so far:
- what expression should I use so that I can find what I need ?
- How can I , then, store those values, each two in a vector / list with the same name ? eg: ipsla[0]
, ipsla[1]
?
I would also like some explanations if it is possible.
Thank you.
All of the above answers work, but if you're looking to really understand the problem I think this might be a good approach. First you need to think about what you're trying to find.
For the first three you want to match patterns that are like name space digits
. The regex for this is name \\d+
the \\d
means a digit character and the +
means one or more.
So for ip sla you would want a regex like:
ip sla \d+
However you're only interested in number, so you can use parentheses to specify the number as its own group:
ip sla (\d+)
Finally, in python a backslash ( \\
) is used to escape the character in front of it, so for python to recognize your backslash we'll need two of them, so in python code you would have:
pattern = "ip sla (\\d+)"
If you look at the docs , the re module has a function re.findall
that makes it unnecessary to split your file into lines.
re.findall(pattern, string)
will return a list of strings that match the pattern, or if there is only one group present (like in our situation where the one group is the \\d+
) it will return that group.
That means that
re.findall("ip sla (\\d+)", fileText)
will return a list of digits (as strings) that are your ip sla values.
That can also be applied to mpid and vlan.
However, for tag you want to match letters and numbers. In regex these are known as word characters, and to match a word character we can use \\w
. Once again we will use the +
modifier to match one or more word characters. This leaves us with
pattern = "tag (\\w+)"
Strategy:
1. Read file into string
2. Construct and execute regex searches for each pattern
3. Iterate through each list of results
4. Append each result to the correct array
Code:
import re
myFile = open("regex.txt","r") # open file for reading
myFileData = myFile.read() # read file into a string
myFile.close() # close file now that we're done
# create lists for each thing you're looking for
ipsla = []
mpid = []
vlan = []
tag = []
# finds a pattern like "ip sla <one or more digits 0-9>"
results = re.findall("ip sla (\\d+)", myFileData)
for result in results:
# add the number (as an int) to your ipsla list
ipsla.append(int(result))
# rinse & repeat :)
results = re.findall("mpid (\\d+)", myFileData)
for result in results:
mpid.append(int(result))
results = re.findall("vlan (\\d+)", myFileData)
for result in results:
vlan.append(int(result))
results = re.findall("tag (\\w+)", myFileData)
for result in results:
if result not in tag:
tag.append(result)
print ipsla
print mpid
print vlan
print tag
Using regex with named groups:
import re
from collections import defaultdict
results = defaultdict(list)
matcher = re.compile(r'ip sla (?P<ipsla>\d+)|mpid (?P<mpid>\d+)'
r'|vlan (?P<vlan>\d+)|tag (?P<tag>\w+)')
with open("regex.txt") as f:
for line in f:
for match in re.finditer(matcher, line):
results[match.lastgroup].append(match.group(match.lastgroup))
print(results)
Now your results are available in the ( defaultdict
) dictionary results:
defaultdict(<class 'list'>, {
'mpid': ['6553', '7553'],
'vlan': ['2553', '3553'],
'ipsla': ['2553', '3553'],
'tag': ['CSCO5839', 'CSCO5839']
})
You can get the individual lists into separate variables, if necessary, afterwards with:
ipsla = results['ipsla']
vlan = results['vlan']
mpid = results['mpid']
tag = results['tag']
you can try like this:
>>> ipsla, mpid, vlan, tag = [], [], [] ,[]
>>> my_string = open('your_file', 'r')
>>> for x in re.findall("ip sla \d+|mpid \d+|vlan \d+|tag \S+", my_string):
... if x.startswith("ip sla"):
... ipsla.append(x.split()[2])
... if x.startswith("mpid"):
... mpid.append(x.split()[1])
... if x.startswith("vlan"):
... vlan.append(x.split()[1])
... if x.startswith("tag"):
... tag.append(x.split()[1])
...
>>> ipsla
['2553', '3553']
>>> mpid
['6553', '7553']
>>> vlan
['2553', '3553']
>>> tag
['CSCO5839']
I suggest this code:
import re
sla = [] # Declare the list variables
mpid = []
vlan = []
with open("regex.txt","r") as myFile: # Open the file
p = re.compile(r'(sla|mpid|vlan)\s+(\d+)') # Create regex
for m in p.finditer(myFile.read()): # Find the match in the current input
if m.group(1) == 'sla':
sla.append(m.group(2)) # Add a match to the corresponding list
if m.group(1) == 'mpid':
mpid.append(m.group(2))
if m.group(1) == 'vlan':
vlan.append(m.group(2))
See the Python demo :
import re
s = """ip sla logging traps
ip sla 2553
ethernet jitter mpid 6553 domain Gravity vlan 2553 num-frames 100 interval 100
tag CSCO5839
frequency 300
ip sla schedule 2553 life forever start-time now
ip sla 3553
ethernet jitter mpid 7553 domain Gravity vlan 3553 num-frames 100 interval 100
tag CSCO5839
frequency 300
ip sla schedule 3553 life forever start-time now"""
sla = [] # Declare the list variables
mpid = []
vlan = []
p = re.compile(r'(sla|mpid|vlan)\s+(\d+)') # Create regex
for m in p.finditer(s): # Find the match in the current input
if m.group(1) == 'sla':
sla.append(m.group(2)) # Add a match to the corresponding list
if m.group(1) == 'mpid':
mpid.append(m.group(2))
if m.group(1) == 'vlan':
vlan.append(m.group(2))
print(sla)
print(mpid)
print(vlan)
Output:
['2553', '3553'] # sla
['6553', '7553'] # mpid
['2553', '3553'] # vlan
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.