简体   繁体   中英

Python/Regex - extracting data with split

my sample text is as per below:

data = """
    NAME: "Chassis", DESCR: "Nexus5548 Chassis"
    PID: N5K-C5548UP       , VID: V01 , SN: SSI1F8A204LK

    NAME: "Module 1", DESCR: "O2 32X10GE/Modular Universal Platform Supervisor"
    PID: N5K-C5548UP       , VID: V01 , SN: FOC1FS7Q2P

    NAME: "Module 2", DESCR: "O2 16X10GE Ethernet Module"
    PID: N55-M16P          , VID: V01 , SN: FOC15840LYH

    NAME: "Fan 1", DESCR: "Chassis fan module"
    PID: N5548P-FAN        , VID: N/A , SN: N/A

    NAME: "Fan 2", DESCR: "Chassis fan module"
    PID: N5548P-FAN        , VID: N/A , SN: N/A

    NAME: "Power supply 1", DESCR: "AC power supply"
    PID: N55-PAC-750W      , VID: V02 , SN: ART18790WA

    NAME: "Power supply 2", DESCR: "AC power supply"
    PID: N55-PAC-750W      , VID: V02 , SN: ART182126V2

    NAME: "Module 3", DESCR: "O2 Daughter Card with L3 ASIC"
    PID: N55-D160L3-V2     , VID: V01 , SN: FOC14952NU2
"""

What im trying to acheive is to get the description PID and Serial of each of these parts into a class.

first i thought id put them all onto one line, then split the lines so that the two lines begining NAME: and PID: would be on the same line, once there each on the same line i can then get the data from each line.

My latest attempts thus far:

data = ''.join(sample.splitlines())
nd = re.split(r"(\NAME:)", data)

This puts name on its own line and the rest of the line on another, this one is close but then i would need to remove all the lines that just have NAME: on to be able to iterate

data = ''.join(sample.splitlines())
nd = re.split(r"(SN:\s[\w\-]+)", data)

This is messy, the previous attempt was closer.

Does anyone know how i can get each part of data onto one line or a better way of doing this?

Thanks

The following:

import re

data = """
    NAME: "Chassis", DESCR: "Nexus5548 Chassis"
    PID: N5K-C5548UP       , VID: V01 , SN: SSI1F8A204LK

    NAME: "Module 1", DESCR: "O2 32X10GE/Modular Universal Platform Supervisor"
    PID: N5K-C5548UP       , VID: V01 , SN: FOC1FS7Q2P

    NAME: "Module 2", DESCR: "O2 16X10GE Ethernet Module"
    PID: N55-M16P          , VID: V01 , SN: FOC15840LYH

    NAME: "Fan 1", DESCR: "Chassis fan module"
    PID: N5548P-FAN        , VID: N/A , SN: N/A

    NAME: "Fan 2", DESCR: "Chassis fan module"
    PID: N5548P-FAN        , VID: N/A , SN: N/A

    NAME: "Power supply 1", DESCR: "AC power supply"
    PID: N55-PAC-750W      , VID: V02 , SN: ART18790WA

    NAME: "Power supply 2", DESCR: "AC power supply"
    PID: N55-PAC-750W      , VID: V02 , SN: ART182126V2

    NAME: "Module 3", DESCR: "O2 Daughter Card with L3 ASIC"
    PID: N55-D160L3-V2     , VID: V01 , SN: FOC14952NU2
"""

matches = re.findall(r'NAME: \"(.*)\",\s*'
                     r'DESCR: \"(.*)\"\s*'
                     r'PID: (\S+)\s*,\s*'
                     r'VID: (\S+)\s*,\s*'
                     r'SN: (\S+)',
                     data,
                     re.MULTILINE)

print matches

will print:

[('Chassis', 'Nexus5548 Chassis', 'N5K-C5548UP', 'V01', 'SSI1F8A204LK'), ('Module 1', 'O2 32X10GE/Modular Universal Platform Supervisor', 'N5K-C5548UP', 'V01', 'FOC1FS7Q2P'), ('Module 2', 'O2 16X10GE Ethernet Module', 'N55-M16P', 'V01', 'FOC15840LYH'), ('Fan 1', 'Chassis fan module', 'N5548P-FAN', 'N/A', 'N/A'), ('Fan 2', 'Chassis fan module', 'N5548P-FAN', 'N/A', 'N/A'), ('Power supply 1', 'AC power supply', 'N55-PAC-750W', 'V02', 'ART18790WA'), ('Power supply 2', 'AC power supply', 'N55-PAC-750W', 'V02', 'ART182126V2'), ('Module 3', 'O2 Daughter Card with L3 ASIC', 'N55-D160L3-V2', 'V01', 'FOC14952NU2')]

ie a tuple of NAME, DESCR, PID, VID, SN for each entry.

Use the python split() function. It will create an array containing every part of string separated by whitespace. Then you can iterate over this by split("/n") which will split the string by line breaks. Code:

for index,line in enumerate(data.split("/n")):
    if (index - 2)%3 == 0:
        PID = line.split()[1]
        serial_number = line.split()[7]
        # here add some code to save the PID and SN whereever you want...

The code above will iterate over each line and every third line (starting from second line) it will do something - achieved by the if (index - 2)%3 == 0: condition. Then it will split the string by whitespaces and you can find your desired PID and serial by the indexes.

Just pay attention to the condition comparing the line number, because I am not sure if index - 2 is accurate. Maybe index - 1 will be the right condition. You must adjust it by yourself :)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM