简体   繁体   中英

Using Perl to read non-delimited text file

I have a file that has a standard input but in a form that I haven't tried to read into a Perl program before.

The format of the file is this:

 Net Number Assignments


Number  xxx.xxx.xxx.xxx
Netmask in /## Form     30
Type    IP
Status  InUse
Description     mpirpd-cjdn
Notes   mgmt
Entry-Id        000000000026450
Submitter       John Doe
Create-date     2009-07-01-13:55:24
Contact-Data    INTERNAL/555-555-5555
Contact-Id      CON-000028508




Net Number Assignments


Number  xxx.xxx.xxx.xxx
Netmask in /## Form     32
Type    IP
Status  InUse
Description     switch Lo0 -- switch unnamed
Notes   Reverved for Lan Management Loop Backs and links
Entry-Id        000000000032710
Submitter       John Doe
Create-date     2015-11-25-10:59:27
Last-modified-by        John Doe
Modified-date   2015-11-25-11:30:06
Contact-Data    INTERNAL/555-555-5555
Contact-Id      CON-000028508




Net Number Assignments


Number  xxx.xxx.xxx.xxx
Netmask in /## Form     32
Type    IP
Status  InUse
Description     mplsfe9-hub
Area    mpls
Entry-Id        000000000024150
Submitter       Russ Reilly
Create-date     2007-05-02-18:26:20
Last-modified-by        John Doe
Modified-date   2013-05-06-19:09:37
Contact Name    ITG  INTERNAL
Contact Phone   555-555-5555
Contact E-mail  me@home.com

Not all of the fields are always used (example: Contact Name and Contact Phone could be missing in the next record).

I don't necessarily need the field headings as they are consistently in the same location for each record.

I am sure this has been done before and probably has a simple solution so I am asking the question before I recreate the wheel.

I would recommend an array of hashes as the ideal data structure for the file you've presented.

We set the input record separator to '' to treat two or more consecutive empty lines as a single empty line. Then, within each record, we just split each line by two or more spaces, which preserves your keys that contain spaces. The split is limited to 2 fields total to prevent additional fields from being created for values that contain two or more consecutive spaces (eg, ITG INTERNAL ).

use strict;
use warnings;

use Data::Dump;

local $/ = '';
my @data;

while (<DATA>) {
    chomp;
    next if $_ eq 'Net Number Assignments';
    my %record;

    for my $line (split(/\n/)) {
        my ($key, $value) = split(/\s\s+/, $line, 2);
        $record{$key} = $value;
    }

    push(@data, \%record);
}

dd(\@data);

__DATA__
Net Number Assignments


Number  xxx.xxx.xxx.xxx
Netmask in /## Form     30
Type    IP
Status  InUse
Description     mpirpd-cjdn
Notes   mgmt
Entry-Id        000000000026450
Submitter       John Doe
Create-date     2009-07-01-13:55:24
Contact-Data    INTERNAL/555-555-5555
Contact-Id      CON-000028508




Net Number Assignments


Number  xxx.xxx.xxx.xxx
Netmask in /## Form     32
Type    IP
Status  InUse
Description     switch Lo0 -- switch unnamed
Notes   Reverved for Lan Management Loop Backs and links
Entry-Id        000000000032710
Submitter       John Doe
Create-date     2015-11-25-10:59:27
Last-modified-by        John Doe
Modified-date   2015-11-25-11:30:06
Contact-Data    INTERNAL/555-555-5555
Contact-Id      CON-000028508




Net Number Assignments


Number  xxx.xxx.xxx.xxx
Netmask in /## Form     32
Type    IP
Status  InUse
Description     mplsfe9-hub
Area    mpls
Entry-Id        000000000024150
Submitter       Russ Reilly
Create-date     2007-05-02-18:26:20
Last-modified-by        John Doe
Modified-date   2013-05-06-19:09:37
Contact Name    ITG  INTERNAL
Contact Phone   555-555-5555
Contact E-mail  me@home.com

Output:

[
  {
    "Contact-Data"        => "INTERNAL/555-555-5555",
    "Contact-Id"          => "CON-000028508",
    "Create-date"         => "2009-07-01-13:55:24",
    "Description"         => "mpirpd-cjdn",
    "Entry-Id"            => "000000000026450",
    "Netmask in /## Form" => 30,
    "Notes"               => "mgmt",
    "Number"              => "xxx.xxx.xxx.xxx",
    "Status"              => "InUse",
    "Submitter"           => "John Doe",
    "Type"                => "IP",
  },
  {
    "Contact-Data"        => "INTERNAL/555-555-5555",
    "Contact-Id"          => "CON-000028508",
    "Create-date"         => "2015-11-25-10:59:27",
    "Description"         => "switch Lo0 -- switch unnamed",
    "Entry-Id"            => "000000000032710",
    "Last-modified-by"    => "John Doe",
    "Modified-date"       => "2015-11-25-11:30:06",
    "Netmask in /## Form" => 32,
    "Notes"               => "Reverved for Lan Management Loop Backs and links",
    "Number"              => "xxx.xxx.xxx.xxx",
    "Status"              => "InUse",
    "Submitter"           => "John Doe",
    "Type"                => "IP",
  },
  {
    "Area"                => "mpls",
    "Contact E-mail"      => "me\@home.com",
    "Contact Name"        => "ITG  INTERNAL",
    "Contact Phone"       => "555-555-5555",
    "Create-date"         => "2007-05-02-18:26:20",
    "Description"         => "mplsfe9-hub",
    "Entry-Id"            => "000000000024150",
    "Last-modified-by"    => "John Doe",
    "Modified-date"       => "2013-05-06-19:09:37",
    "Netmask in /## Form" => 32,
    "Number"              => "xxx.xxx.xxx.xxx",
    "Status"              => "InUse",
    "Submitter"           => "Russ Reilly",
    "Type"                => "IP",
  },
]

This is conceptually simple but somewhat tedious. The canonical version of this type of parsing solution looks like this:

#!/usr/bin/perl
my $all = {};  # A hash to hold all number entries indexed by IP
my $cur = {};  # A hash to hold the current entry we are parsing
while(<>)
{
    chomp;
    if (my ($ip) = /^Number\s+(.*)/)
    {
        # If we have a current entry, save it in the $all hash
        $all->{$cur->{number}} = $cur if ($cur->{number});

        $cur = {};
        $cur->{number} = $ip;
    }
    elsif (my ($mask) = /^Netmask in \/## Form\s+(\d+)/)
    {
        $cur->{mask} = $mask;
    }
    elsif ... # Handle remaining input line types, saving what you want in $cur
}
# This is to save the last entry
$all->{$cur->{number}} = $cur if ($cur->{number});

# Your code to process the accumulated entries
...

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM