![](/img/trans.png)
[英]Extract words from list of lists and store them in a separate variable in python
[英]Python - how to match specific words / digits from multiple lines in a text file and store them in separate lists
我有一个.txt文件,就像这样一个:
ip sla logging traps
ip sla 2553
ethernet jitter mpid 6553 domain Gravity vlan 2553 num-frames 100 interval 100
tag CSCO5839
frequency 300
ip sla schedule 2553 life forever start-time now
ip sla 3553
ethernet jitter mpid 7553 domain Gravity vlan 3553 num-frames 100 interval 100
tag CSCO5839
frequency 300
ip sla schedule 3553 life forever start-time now
我想要一种匹配ip sla / mpid / vlan并标记并将其值存储到列表或向量中的方法:
所需的输出:
ipsla[0]=2553
和ipsla[1]=3553
mpid[0]=6553
和mpid[1]=7553
vlan[0]=2553
和vlan[1]=3553
tag[0]=CSCO5839
现在,我才刚刚开始学习Python,我知道我必须解析文件的每一行,然后使用re.match()
匹配所需的结果,然后将获得的结果存储在数组或列表中。
到目前为止我糟糕的代码:
#!/usr/bin/env python
import re
myFile = open("regex.txt","r")
for line in myFile:
if re.match("",line): #I should have a condition there but I got lost
#now I have to store that variable in an array / or a list
到目前为止,我的问题是:
-我应该使用什么表达方式来找到需要的东西?
-那么,如何将这些值分别存储在具有相同名称的向量/列表中? 例如: ipsla[0]
, ipsla[1]
?
如果可能的话,我也想作一些解释。
谢谢。
以上所有答案均有效,但如果您想真正理解问题,我认为这可能是一个不错的方法。 首先,您需要考虑要查找的内容。
对于前三个,您要匹配name space digits
类的模式。 其正则表达式为name \\d+
, \\d
表示数字字符,而+
表示一个或多个。
因此,对于ip sla,您需要一个正则表达式,例如:
ip sla \d+
但是,您只对数字感兴趣,因此可以使用括号将数字指定为自己的组:
ip sla (\d+)
最后,在python中,使用反斜杠( \\
)来转义其前面的字符,因此,为了使python识别您的反斜杠,我们需要两个反斜杠,因此在python代码中,您需要:
pattern = "ip sla (\\d+)"
如果您查看docs ,re模块具有re.findall
函数,使您无需将文件分成几行。
re.findall(pattern, string)
将返回与该模式匹配的字符串列表,或者如果仅存在一组(如在我们的情况中一组为\\d+
),它将返回该组。
那意味着
re.findall("ip sla (\\d+)", fileText)
将返回一个数字列表(作为字符串),这些列表是您的ip sla值。
这也可以应用于mpid和vlan。
但是,对于标签,您要匹配字母和数字。 在正则表达式中,这些被称为单词字符,为了匹配单词,我们可以使用\\w
。 我们将再次使用+
修饰符来匹配一个或多个单词字符。 这给我们留下了
pattern = "tag (\\w+)"
战略:
1. Read file into string
2. Construct and execute regex searches for each pattern
3. Iterate through each list of results
4. Append each result to the correct array
码:
import re
myFile = open("regex.txt","r") # open file for reading
myFileData = myFile.read() # read file into a string
myFile.close() # close file now that we're done
# create lists for each thing you're looking for
ipsla = []
mpid = []
vlan = []
tag = []
# finds a pattern like "ip sla <one or more digits 0-9>"
results = re.findall("ip sla (\\d+)", myFileData)
for result in results:
# add the number (as an int) to your ipsla list
ipsla.append(int(result))
# rinse & repeat :)
results = re.findall("mpid (\\d+)", myFileData)
for result in results:
mpid.append(int(result))
results = re.findall("vlan (\\d+)", myFileData)
for result in results:
vlan.append(int(result))
results = re.findall("tag (\\w+)", myFileData)
for result in results:
if result not in tag:
tag.append(result)
print ipsla
print mpid
print vlan
print tag
将正则表达式与命名组一起使用:
import re
from collections import defaultdict
results = defaultdict(list)
matcher = re.compile(r'ip sla (?P<ipsla>\d+)|mpid (?P<mpid>\d+)'
r'|vlan (?P<vlan>\d+)|tag (?P<tag>\w+)')
with open("regex.txt") as f:
for line in f:
for match in re.finditer(matcher, line):
results[match.lastgroup].append(match.group(match.lastgroup))
print(results)
现在,您的结果将在( defaultdict
)字典结果中可用:
defaultdict(<class 'list'>, {
'mpid': ['6553', '7553'],
'vlan': ['2553', '3553'],
'ipsla': ['2553', '3553'],
'tag': ['CSCO5839', 'CSCO5839']
})
随后,您可以使用以下方法将各个列表放入单独的变量中:
ipsla = results['ipsla']
vlan = results['vlan']
mpid = results['mpid']
tag = results['tag']
您可以这样尝试:
>>> ipsla, mpid, vlan, tag = [], [], [] ,[]
>>> my_string = open('your_file', 'r')
>>> for x in re.findall("ip sla \d+|mpid \d+|vlan \d+|tag \S+", my_string):
... if x.startswith("ip sla"):
... ipsla.append(x.split()[2])
... if x.startswith("mpid"):
... mpid.append(x.split()[1])
... if x.startswith("vlan"):
... vlan.append(x.split()[1])
... if x.startswith("tag"):
... tag.append(x.split()[1])
...
>>> ipsla
['2553', '3553']
>>> mpid
['6553', '7553']
>>> vlan
['2553', '3553']
>>> tag
['CSCO5839']
我建议这段代码:
import re
sla = [] # Declare the list variables
mpid = []
vlan = []
with open("regex.txt","r") as myFile: # Open the file
p = re.compile(r'(sla|mpid|vlan)\s+(\d+)') # Create regex
for m in p.finditer(myFile.read()): # Find the match in the current input
if m.group(1) == 'sla':
sla.append(m.group(2)) # Add a match to the corresponding list
if m.group(1) == 'mpid':
mpid.append(m.group(2))
if m.group(1) == 'vlan':
vlan.append(m.group(2))
参见Python演示 :
import re
s = """ip sla logging traps
ip sla 2553
ethernet jitter mpid 6553 domain Gravity vlan 2553 num-frames 100 interval 100
tag CSCO5839
frequency 300
ip sla schedule 2553 life forever start-time now
ip sla 3553
ethernet jitter mpid 7553 domain Gravity vlan 3553 num-frames 100 interval 100
tag CSCO5839
frequency 300
ip sla schedule 3553 life forever start-time now"""
sla = [] # Declare the list variables
mpid = []
vlan = []
p = re.compile(r'(sla|mpid|vlan)\s+(\d+)') # Create regex
for m in p.finditer(s): # Find the match in the current input
if m.group(1) == 'sla':
sla.append(m.group(2)) # Add a match to the corresponding list
if m.group(1) == 'mpid':
mpid.append(m.group(2))
if m.group(1) == 'vlan':
vlan.append(m.group(2))
print(sla)
print(mpid)
print(vlan)
输出:
['2553', '3553'] # sla
['6553', '7553'] # mpid
['2553', '3553'] # vlan
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.