简体   繁体   English

Python / Regex - 使用split提取数据

[英]Python/Regex - extracting data with split

my sample text is as per below: 我的示例文本如下:

data = """
    NAME: "Chassis", DESCR: "Nexus5548 Chassis"
    PID: N5K-C5548UP       , VID: V01 , SN: SSI1F8A204LK

    NAME: "Module 1", DESCR: "O2 32X10GE/Modular Universal Platform Supervisor"
    PID: N5K-C5548UP       , VID: V01 , SN: FOC1FS7Q2P

    NAME: "Module 2", DESCR: "O2 16X10GE Ethernet Module"
    PID: N55-M16P          , VID: V01 , SN: FOC15840LYH

    NAME: "Fan 1", DESCR: "Chassis fan module"
    PID: N5548P-FAN        , VID: N/A , SN: N/A

    NAME: "Fan 2", DESCR: "Chassis fan module"
    PID: N5548P-FAN        , VID: N/A , SN: N/A

    NAME: "Power supply 1", DESCR: "AC power supply"
    PID: N55-PAC-750W      , VID: V02 , SN: ART18790WA

    NAME: "Power supply 2", DESCR: "AC power supply"
    PID: N55-PAC-750W      , VID: V02 , SN: ART182126V2

    NAME: "Module 3", DESCR: "O2 Daughter Card with L3 ASIC"
    PID: N55-D160L3-V2     , VID: V01 , SN: FOC14952NU2
"""

What im trying to acheive is to get the description PID and Serial of each of these parts into a class. 我试图实现的是将每个部分的描述PID和序列分成一个类。

first i thought id put them all onto one line, then split the lines so that the two lines begining NAME: and PID: would be on the same line, once there each on the same line i can then get the data from each line. 首先我认为id将它们全部放在一行上,然后将这些行分开,以便两条线开始NAME:并且PID:将在同一条线上,一旦每条线在同一条线上,我就可以从每条线获取数据。

My latest attempts thus far: 我迄今为止的最新尝试:

data = ''.join(sample.splitlines())
nd = re.split(r"(\NAME:)", data)

This puts name on its own line and the rest of the line on another, this one is close but then i would need to remove all the lines that just have NAME: on to be able to iterate 这将名称放在自己的行上,而其余的行放在另一行上,这一行是关闭的但是我需要删除所有只有NAME的行:on能够迭代

data = ''.join(sample.splitlines())
nd = re.split(r"(SN:\s[\w\-]+)", data)

This is messy, the previous attempt was closer. 这很麻烦,之前的尝试更接近了。

Does anyone know how i can get each part of data onto one line or a better way of doing this? 有谁知道我如何将每个部分的数据放到一行或更好的方式来做到这一点?

Thanks 谢谢

The following: 下列:

import re

data = """
    NAME: "Chassis", DESCR: "Nexus5548 Chassis"
    PID: N5K-C5548UP       , VID: V01 , SN: SSI1F8A204LK

    NAME: "Module 1", DESCR: "O2 32X10GE/Modular Universal Platform Supervisor"
    PID: N5K-C5548UP       , VID: V01 , SN: FOC1FS7Q2P

    NAME: "Module 2", DESCR: "O2 16X10GE Ethernet Module"
    PID: N55-M16P          , VID: V01 , SN: FOC15840LYH

    NAME: "Fan 1", DESCR: "Chassis fan module"
    PID: N5548P-FAN        , VID: N/A , SN: N/A

    NAME: "Fan 2", DESCR: "Chassis fan module"
    PID: N5548P-FAN        , VID: N/A , SN: N/A

    NAME: "Power supply 1", DESCR: "AC power supply"
    PID: N55-PAC-750W      , VID: V02 , SN: ART18790WA

    NAME: "Power supply 2", DESCR: "AC power supply"
    PID: N55-PAC-750W      , VID: V02 , SN: ART182126V2

    NAME: "Module 3", DESCR: "O2 Daughter Card with L3 ASIC"
    PID: N55-D160L3-V2     , VID: V01 , SN: FOC14952NU2
"""

matches = re.findall(r'NAME: \"(.*)\",\s*'
                     r'DESCR: \"(.*)\"\s*'
                     r'PID: (\S+)\s*,\s*'
                     r'VID: (\S+)\s*,\s*'
                     r'SN: (\S+)',
                     data,
                     re.MULTILINE)

print matches

will print: 将打印:

[('Chassis', 'Nexus5548 Chassis', 'N5K-C5548UP', 'V01', 'SSI1F8A204LK'), ('Module 1', 'O2 32X10GE/Modular Universal Platform Supervisor', 'N5K-C5548UP', 'V01', 'FOC1FS7Q2P'), ('Module 2', 'O2 16X10GE Ethernet Module', 'N55-M16P', 'V01', 'FOC15840LYH'), ('Fan 1', 'Chassis fan module', 'N5548P-FAN', 'N/A', 'N/A'), ('Fan 2', 'Chassis fan module', 'N5548P-FAN', 'N/A', 'N/A'), ('Power supply 1', 'AC power supply', 'N55-PAC-750W', 'V02', 'ART18790WA'), ('Power supply 2', 'AC power supply', 'N55-PAC-750W', 'V02', 'ART182126V2'), ('Module 3', 'O2 Daughter Card with L3 ASIC', 'N55-D160L3-V2', 'V01', 'FOC14952NU2')]

ie a tuple of NAME, DESCR, PID, VID, SN for each entry. 即每个条目的NAME,DESCR,PID,VID,SN元组。

Use the python split() function. 使用python split()函数。 It will create an array containing every part of string separated by whitespace. 它将创建一个包含由空格分隔的字符串的每个部分的数组。 Then you can iterate over this by split("/n") which will split the string by line breaks. 然后你可以通过split(“/ n”)迭代它,这将通过换行符拆分字符串。 Code: 码:

for index,line in enumerate(data.split("/n")):
    if (index - 2)%3 == 0:
        PID = line.split()[1]
        serial_number = line.split()[7]
        # here add some code to save the PID and SN whereever you want...

The code above will iterate over each line and every third line (starting from second line) it will do something - achieved by the if (index - 2)%3 == 0: condition. 上面的代码将迭代每一行和每三行(从第二行开始)它将做一些事情 - 通过if (index - 2)%3 == 0:条件实现。 Then it will split the string by whitespaces and you can find your desired PID and serial by the indexes. 然后它将按空格分割字符串,您可以通过索引找到所需的PID和序列。

Just pay attention to the condition comparing the line number, because I am not sure if index - 2 is accurate. 只需注意比较行号的条件,因为我不确定index - 2是否准确。 Maybe index - 1 will be the right condition. 也许index - 1将是正确的条件。 You must adjust it by yourself :) 你必须自己调整:)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM