Python-如何匹配文本文件中多行中的特定单词/数字并将它们存储在单独的列表中

Question

我有一个.txt文件，就像这样一个：

ip sla logging traps
ip sla 2553
ethernet jitter mpid 6553 domain Gravity vlan 2553 num-frames 100 interval 100
tag CSCO5839
frequency 300
ip sla schedule 2553 life forever start-time now
ip sla 3553
ethernet jitter mpid 7553 domain Gravity vlan 3553 num-frames 100 interval 100
tag CSCO5839
frequency 300
ip sla schedule 3553 life forever start-time now

我想要一种匹配ip sla / mpid / vlan并标记并将其值存储到列表或向量中的方法：

所需的输出：

ipsla[0]=2553和ipsla[1]=3553
mpid[0]=6553和mpid[1]=7553
vlan[0]=2553和vlan[1]=3553
tag[0]=CSCO5839

现在，我才刚刚开始学习Python，我知道我必须解析文件的每一行，然后使用re.match()匹配所需的结果，然后将获得的结果存储在数组或列表中。

到目前为止我糟糕的代码：

#!/usr/bin/env python

import re

myFile = open("regex.txt","r")

for line in myFile:
    if re.match("",line): #I should have a condition there but I got lost
        #now I have to store that variable in an array / or a list

到目前为止，我的问题是：
-我应该使用什么表达方式来找到需要的东西？
-那么，如何将这些值分别存储在具有相同名称的向量/列表中？ 例如： ipsla[0] ， ipsla[1] ？

如果可能的话，我也想作一些解释。

谢谢。

Answer 1

所需的正则表达式及其工作原理的说明

以上所有答案均有效，但如果您想真正理解问题，我认为这可能是一个不错的方法。 首先，您需要考虑要查找的内容。

对于前三个，您要匹配name space digits类的模式。 其正则表达式为name \\d+ ， \\d表示数字字符，而+表示一个或多个。

因此，对于ip sla，您需要一个正则表达式，例如：

ip sla \d+

但是，您只对数字感兴趣，因此可以使用括号将数字指定为自己的组：

ip sla (\d+)

最后，在python中，使用反斜杠（ \\ ）来转义其前面的字符，因此，为了使python识别您的反斜杠，我们需要两个反斜杠，因此在python代码中，您需要：

pattern = "ip sla (\\d+)"

如果您查看docs ，re模块具有re.findall函数，使您无需将文件分成几行。

re.findall(pattern, string)将返回与该模式匹配的字符串列表，或者如果仅存在一组（如在我们的情况中一组为\\d+ ），它将返回该组。

那意味着

re.findall("ip sla (\\d+)", fileText)

将返回一个数字列表（作为字符串），这些列表是您的ip sla值。

这也可以应用于mpid和vlan。

但是，对于标签，您要匹配字母和数字。 在正则表达式中，这些被称为单词字符，为了匹配单词，我们可以使用\\w 。 我们将再次使用+修饰符来匹配一个或多个单词字符。 这给我们留下了

pattern = "tag (\\w+)"

战略：

1. Read file into string

2. Construct and execute regex searches for each pattern

3. Iterate through each list of results

4. Append each result to the correct array

码：

import re

myFile = open("regex.txt","r")  # open file for reading
myFileData = myFile.read()      # read file into a string
myFile.close()                  # close file now that we're done

# create lists for each thing you're looking for
ipsla = []
mpid = []
vlan = []
tag = []

# finds a pattern like "ip sla <one or more digits 0-9>"
results = re.findall("ip sla (\\d+)", myFileData)
for result in results:
        # add the number (as an int) to your ipsla list
        ipsla.append(int(result))

# rinse & repeat :)
results = re.findall("mpid (\\d+)", myFileData)
for result in results:
        mpid.append(int(result))

results = re.findall("vlan (\\d+)", myFileData)
for result in results:
        vlan.append(int(result))

results = re.findall("tag (\\w+)", myFileData)
for result in results:
        if result not in tag:
                tag.append(result)

print ipsla
print mpid
print vlan
print tag

Answer 2

将正则表达式与命名组一起使用：

import re
from collections import defaultdict

results = defaultdict(list)
matcher = re.compile(r'ip sla (?P<ipsla>\d+)|mpid (?P<mpid>\d+)'
                     r'|vlan (?P<vlan>\d+)|tag (?P<tag>\w+)')

with open("regex.txt") as f:
    for line in f:
        for match in re.finditer(matcher, line):
            results[match.lastgroup].append(match.group(match.lastgroup))

print(results)

现在，您的结果将在（ defaultdict ）字典结果中可用：

defaultdict(<class 'list'>, {
    'mpid': ['6553', '7553'],
    'vlan': ['2553', '3553'],
    'ipsla': ['2553', '3553'],
    'tag': ['CSCO5839', 'CSCO5839']
})

随后，您可以使用以下方法将各个列表放入单独的变量中：

ipsla = results['ipsla']
vlan = results['vlan']
mpid = results['mpid']
tag = results['tag']

Answer 3

您可以这样尝试：

>>> ipsla, mpid, vlan, tag = [], [], [] ,[]
>>> my_string = open('your_file', 'r')
>>> for x in re.findall("ip sla \d+|mpid \d+|vlan \d+|tag \S+", my_string):
...     if x.startswith("ip sla"):
...         ipsla.append(x.split()[2])
...     if x.startswith("mpid"):
...         mpid.append(x.split()[1])
...     if x.startswith("vlan"):
...         vlan.append(x.split()[1])
...     if x.startswith("tag"):
...         tag.append(x.split()[1])
...
>>> ipsla
['2553', '3553']
>>> mpid
['6553', '7553']
>>> vlan
['2553', '3553']
>>> tag
['CSCO5839']

Answer 4

我建议这段代码：

import re

sla = [] # Declare the list variables
mpid = []
vlan = []

with open("regex.txt","r") as myFile:              # Open the file
    p = re.compile(r'(sla|mpid|vlan)\s+(\d+)')     # Create regex
    for m in p.finditer(myFile.read()):            # Find the match in the current input
        if m.group(1) == 'sla':
            sla.append(m.group(2))                 # Add a match to the corresponding list
        if m.group(1) == 'mpid':
            mpid.append(m.group(2))
        if m.group(1) == 'vlan':
            vlan.append(m.group(2))

参见Python演示：

import re
s = """ip sla logging traps
ip sla 2553
ethernet jitter mpid 6553 domain Gravity vlan 2553 num-frames 100 interval 100
tag CSCO5839
frequency 300
ip sla schedule 2553 life forever start-time now
ip sla 3553
ethernet jitter mpid 7553 domain Gravity vlan 3553 num-frames 100 interval 100
tag CSCO5839
frequency 300
ip sla schedule 3553 life forever start-time now"""
sla = [] # Declare the list variables
mpid = []
vlan = []

p = re.compile(r'(sla|mpid|vlan)\s+(\d+)') # Create regex
for m in p.finditer(s):                  # Find the match in the current input
    if m.group(1) == 'sla':
        sla.append(m.group(2))              # Add a match to the corresponding list
    if m.group(1) == 'mpid':
        mpid.append(m.group(2))
    if m.group(1) == 'vlan':
        vlan.append(m.group(2))

print(sla)
print(mpid)
print(vlan)

输出：

['2553', '3553'] # sla
['6553', '7553'] # mpid
['2553', '3553'] # vlan

Python-如何匹配文本文件中多行中的特定单词/数字并将它们存储在单独的列表中

问题描述

4 个解决方案

解决方案1
3 2015-02-20 22:04:29

所需的正则表达式及其工作原理的说明

解决方案2
2 2015-02-20 21:45:09

解决方案3
1 2015-02-20 21:39:03

解决方案4
0 2015-02-20 21:41:55

Python-如何匹配文本文件中多行中的特定单词/数字并将它们存储在单独的列表中

问题描述

4 个解决方案

解决方案1 3 2015-02-20 22:04:29

所需的正则表达式及其工作原理的说明

解决方案2 2 2015-02-20 21:45:09

解决方案3 1 2015-02-20 21:39:03

解决方案4 0 2015-02-20 21:41:55

解决方案1
3 2015-02-20 22:04:29

解决方案2
2 2015-02-20 21:45:09

解决方案3
1 2015-02-20 21:39:03

解决方案4
0 2015-02-20 21:41:55