繁体   English   中英

Python-如何匹配文本文件中多行中的特定单词/数字并将它们存储在单独的列表中

[英]Python - how to match specific words / digits from multiple lines in a text file and store them in separate lists

我有一个.txt文件,就像这样一个:

ip sla logging traps
ip sla 2553
ethernet jitter mpid 6553 domain Gravity vlan 2553 num-frames 100 interval 100
tag CSCO5839
frequency 300
ip sla schedule 2553 life forever start-time now
ip sla 3553
ethernet jitter mpid 7553 domain Gravity vlan 3553 num-frames 100 interval 100
tag CSCO5839
frequency 300
ip sla schedule 3553 life forever start-time now

我想要一种匹配ip sla / mpid / vlan并标记并将其值存储到列表或向量中的方法:

所需的输出:

ipsla[0]=2553ipsla[1]=3553
mpid[0]=6553mpid[1]=7553
vlan[0]=2553vlan[1]=3553
tag[0]=CSCO5839

现在,我才刚刚开始学习Python,我知道我必须解析文件的每一行,然后使用re.match()匹配所需的结果,然后将获得的结果存储在数组或列表中。

到目前为止我糟糕的代码:

#!/usr/bin/env python

import re

myFile = open("regex.txt","r")

for line in myFile:
    if re.match("",line): #I should have a condition there but I got lost
        #now I have to store that variable in an array / or a list

到目前为止,我的问题是:
-我应该使用什么表达方式来找到需要的东西?
-那么,如何将这些值分别存储在具有相同名称的向量/列表中? 例如: ipsla[0]ipsla[1]

如果可能的话,我也想作一些解释。

谢谢。

所需的正则表达式及其工作原理的说明

以上所有答案均有效,但如果您想真正理解问题,我认为这可能是一个不错的方法。 首先,您需要考虑要查找的内容。

对于前三个,您要匹配name space digits类的模式。 其正则表达式为name \\d+\\d表示数字字符,而+表示一个或多个。

因此,对于ip sla,您需要一个正则表达式,例如:

ip sla \d+

但是,您只对数字感兴趣,因此可以使用括号将数字指定为自己的组:

ip sla (\d+)

最后,在python中,使用反斜杠( \\ )来转义其前面的字符,因此,为了使python识别您的反斜杠,我们需要两个反斜杠,因此在python代码中,您需要:

pattern = "ip sla (\\d+)"

如果您查看docs ,re模块具有re.findall函数,使您无需将文件分成几行。

re.findall(pattern, string)将返回与该模式匹配的字符串列表,或者如果仅存在一组(如在我们的情况中一组为\\d+ ),它将返回该组。

那意味着

re.findall("ip sla (\\d+)", fileText)

将返回一个数字列表(作为字符串),这些列表是您的ip sla值。

这也可以应用于mpid和vlan。

但是,对于标签,您要匹配字母和数字。 在正则表达式中,这些被称为单词字符,为了匹配单词,我们可以使用\\w 我们将再次使用+修饰符来匹配一个或多个单词字符。 这给我们留下了

pattern = "tag (\\w+)"

战略:

1. Read file into string

2. Construct and execute regex searches for each pattern

3. Iterate through each list of results

4. Append each result to the correct array

码:

import re

myFile = open("regex.txt","r")  # open file for reading
myFileData = myFile.read()      # read file into a string
myFile.close()                  # close file now that we're done

# create lists for each thing you're looking for
ipsla = []
mpid = []
vlan = []
tag = []

# finds a pattern like "ip sla <one or more digits 0-9>"
results = re.findall("ip sla (\\d+)", myFileData)
for result in results:
        # add the number (as an int) to your ipsla list
        ipsla.append(int(result))

# rinse & repeat :)
results = re.findall("mpid (\\d+)", myFileData)
for result in results:
        mpid.append(int(result))

results = re.findall("vlan (\\d+)", myFileData)
for result in results:
        vlan.append(int(result))

results = re.findall("tag (\\w+)", myFileData)
for result in results:
        if result not in tag:
                tag.append(result)

print ipsla
print mpid
print vlan
print tag

将正则表达式与命名组一起使用:

import re
from collections import defaultdict

results = defaultdict(list)
matcher = re.compile(r'ip sla (?P<ipsla>\d+)|mpid (?P<mpid>\d+)'
                     r'|vlan (?P<vlan>\d+)|tag (?P<tag>\w+)')

with open("regex.txt") as f:
    for line in f:
        for match in re.finditer(matcher, line):
            results[match.lastgroup].append(match.group(match.lastgroup))

print(results)

现在,您的结果将在( defaultdict )字典结果中可用:

defaultdict(<class 'list'>, {
    'mpid': ['6553', '7553'],
    'vlan': ['2553', '3553'],
    'ipsla': ['2553', '3553'],
    'tag': ['CSCO5839', 'CSCO5839']
})

随后,您可以使用以下方法将各个列表放入单独的变量中:

ipsla = results['ipsla']
vlan = results['vlan']
mpid = results['mpid']
tag = results['tag']

您可以这样尝试:

>>> ipsla, mpid, vlan, tag = [], [], [] ,[]
>>> my_string = open('your_file', 'r')
>>> for x in re.findall("ip sla \d+|mpid \d+|vlan \d+|tag \S+", my_string):
...     if x.startswith("ip sla"):
...         ipsla.append(x.split()[2])
...     if x.startswith("mpid"):
...         mpid.append(x.split()[1])
...     if x.startswith("vlan"):
...         vlan.append(x.split()[1])
...     if x.startswith("tag"):
...         tag.append(x.split()[1])
...
>>> ipsla
['2553', '3553']
>>> mpid
['6553', '7553']
>>> vlan
['2553', '3553']
>>> tag
['CSCO5839']

我建议这段代码:

import re

sla = [] # Declare the list variables
mpid = []
vlan = []

with open("regex.txt","r") as myFile:              # Open the file
    p = re.compile(r'(sla|mpid|vlan)\s+(\d+)')     # Create regex
    for m in p.finditer(myFile.read()):            # Find the match in the current input
        if m.group(1) == 'sla':
            sla.append(m.group(2))                 # Add a match to the corresponding list
        if m.group(1) == 'mpid':
            mpid.append(m.group(2))
        if m.group(1) == 'vlan':
            vlan.append(m.group(2))

参见Python演示

import re
s = """ip sla logging traps
ip sla 2553
ethernet jitter mpid 6553 domain Gravity vlan 2553 num-frames 100 interval 100
tag CSCO5839
frequency 300
ip sla schedule 2553 life forever start-time now
ip sla 3553
ethernet jitter mpid 7553 domain Gravity vlan 3553 num-frames 100 interval 100
tag CSCO5839
frequency 300
ip sla schedule 3553 life forever start-time now"""
sla = [] # Declare the list variables
mpid = []
vlan = []

p = re.compile(r'(sla|mpid|vlan)\s+(\d+)') # Create regex
for m in p.finditer(s):                  # Find the match in the current input
    if m.group(1) == 'sla':
        sla.append(m.group(2))              # Add a match to the corresponding list
    if m.group(1) == 'mpid':
        mpid.append(m.group(2))
    if m.group(1) == 'vlan':
        vlan.append(m.group(2))

print(sla)
print(mpid)
print(vlan)

输出:

['2553', '3553'] # sla
['6553', '7553'] # mpid
['2553', '3553'] # vlan

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM