簡體   English   中英

Perl - 排序復雜的數據結構

[英]Perl - sorting complex data structure

我有一個舊的 perl 項目,一個事件日志的文本解析器,並收到了按事件 ID 對輸出進行排序和刪除重復事件的請求。 因此解析器讀取一個文本文件並將每個事件放入一個數組中。 數組上的每個字段都包含一個帶有多個鍵 -> 值對的散列。 一個鍵稱為序列,它包含事件的編號。 我現在想根據每個數組字段的序列值對數組進行排序。 其次,我想從數組中刪除重復的相同序列號。

下面是我如何創建數組和散列的一些代碼,以便您了解數據結構:

open (my $mel, "<", $in_filename) or die "\nFile '$in_filename' does not exist or is not readable.\n";

my $i=0;
my $eventcount = 0;

while (<$mel>) {

        # Separate events by "Date/Time" :
        if (/^$/) {
            next;
        }
        if (/^Date\/Time:\s(.*)$/) {
            if ($eventcount >0) {
                $i++;
            }
            $eventcount++; # eventcount initialized with ‘0’
        }

        # Gathering information of the MEL event :
        if (/^Date\/Time:\s(.*)$/) {$MEL[$i]{date} = $1; next;}
        if (/^Sequence number:\s(\d+)$/) {$MEL[$i]{sequence} = $1; next;}
        if (/^Event type:\s([0-9|a-f|A-F]{1,6})$/) {$MEL[$i]{type} = lc $1; next;}
        if (/^Event category:\s(\w+)$/) {$MEL[$i]{category} = $1; next;}
        if (/^Priority:\s(\w+)/) {$MEL[$i]{priority} = $1; next;}
        if (/^Description:\s(.*)$/) {$MEL[$i]{description} = $1; next;}
        if (/^Event specific codes:\s(.*)$/) {$MEL[$i]{code} = $1; next;}
        if (/^Component location:\s(.*)$/) {$MEL[$i]{location} = $1; next;}
        if (/^Logged by:\s.*(.)$/) {$MEL[$i]{logged_by} = $1; next;}
        if (/^4[dD]\s45\s4[cC]\s48\s(\d\d)/) {$MEL[$i]{version} = hex $1;}

}

文本文件中的事件示例:

Date/Time: 2/3/20, 12:18:20 PM
Sequence number: 200 <==============
Event type: 5023
Event category: Command
Priority: Informational
Event needs attention: false
Event send alert: false
Event visibility: true
Description: Controller return status/function call for requested operation
Event specific codes: b8/1/0
Component type: Controller
Component location: Shelf 99, Bay A
Logged by: Controller in bay A

所以基本上我想根據散列中鍵的值對包含散列引用的數組進行排序。

其次,當鍵的值也存在於不同的數組字段中時,我想從數組中刪除一個字段。

我希望有人明白我需要什么:-)

這可能嗎?

您可以使用自定義排序塊對數組進行排序:

my @sorted = sort { $a->{sequence} <=> $b->{sequence} } @MEL;

但是使用散列的散列而不是散列的數組要容易得多。

#!/usr/bin/perl
use warnings;
use strict;
use feature qw{ say };

my $in_filename = ... ;
open my $mel, '<', $in_filename or die $!;

my %event;

my ($current, $id);
while (<$mel>) {

    next if /^$/;

    if (m{^Date/Time:\s(.*)$}) {
        if (defined $id) {
            $event{$id} = $current;
        }
        $current = { date => $1 };
    } elsif (/^Sequence number:\s(\d+)$/) {
        $id = $1;
    } elsif (/^Event type:\s([0-9|a-f|A-F]{1,6})$/) {
        $current->{type} = lc $1;
    } elsif (/^Event category:\s(\w+)$/) {
        $current->{category} = $1;
    } elsif (/^Priority:\s(\w+)/) {
        $current->{priority} = $1;
    } elsif (/^Description:\s(.*)$/) {
        $current->{description} = $1;
    } elsif (/^Event specific codes:\s(.*)$/) {
        $current->{code} = $1;
    } elsif (/^Component location:\s(.*)$/) {
        $current->{location} = $1;
    } elsif (/^Logged by:\s.*(.)$/) {
        $current->{logged_by} = $1;
    } elsif (/^4[dD]\s45\s4[cC]\s48\s(\d\d)/) {
        $current->{version} = hex $1;
    }
}

for my $e (sort { $a <=> $b } keys %event) {
    say 'Sequence number:', $e;
    for my $k (sort keys %{ $event{$e} }) {
        say "$k: $event{$e}{$k}";
    }
}

通過構建一個大型正則表達式來匹配大部分細節,可以進一步簡化它:

my $regex = qr/
               Event\ type:\s(?<type>[0-9|a-f|A-F]{1,6})$
              |Event\ category:\s(?<category>\w+)$
              |Priority:\s(?<priority>\w+)
              |Description:\s(?<description>.*)$
              |Event\ specific\ codes:\s(?<code>.*)$
              |Component\ location:\s(?<location>.*)$
              |Logged\ by:\s.*(?<logged>.)$
              |4[dD]\s45\s4[cC]\s48\s(?<version>\d\d)
/x;

while (<$mel>) {
    next if /^$/;

    if (m{^Date/Time:\s(.*)$}) {
        if (defined $id) {
            $current->{type} = lc $current->{type}
                if exists $current->{type};
            $current->{version} = hex $current->{version}
                if exists $current->{version};
            $event{$id} = $current;
        }
        $current = { date => $1 };
    } elsif (/^Sequence number:\s(\d+)$/) {
        $id = $1;
    } elsif (/^$regex/) {
        $current->{ (keys %+)[0] } = (values %+)[0];
    } else {
        warn "Skipping: $_";
    }
}

問題描述不完整。 不清楚這些記錄是否同質(所有類型都相同)。

好吧,如果上面的假設是正確的,那么任務就很簡單了。

將文件拆分成記錄,然后用事件號作為鍵填充哈希,並記錄為值,跳過重復項。

然后對鍵哈希進行排序並輸出記錄。

use strict;
use warnings;
use feature 'say';

my %events;
my %seen;
my $data = do { local $/; <DATA> };

$data =~ s!\n(Date/Time)!\n\n$1!g;

my @data = split '\n\n', $data;

for my $record (@data) {
    my $event = get_event_n( $record );

    next if $seen{$event};

    $seen{$event}   = 1;
    $events{$event} = $record;
}

say '----- Sorted Events -----';

for my $event (sort keys %events) {
    say $events{$event};
    say '-' x 45;                 # record separator as visual indicator
}

sub get_event_n {
    my $record = shift;
    my $sequence;

    $record =~ /Sequence number:\s+(\d+)/;
    $sequence = $1;

    return $sequence;
}

__DATA__
Date/Time: 2/3/20, 12:19:20 PM
Sequence number: 230
Event type: 5023
Event category: Command
Priority: Informational
Event needs attention: false
Event send alert: false
Event visibility: true
Description: Controller return status/function call for requested operation
Event specific codes: b8/1/0
Component type: Controller
Component location: Shelf 99, Bay A
Logged by: Controller in bay A
Date/Time: 2/3/20, 12:18:20 PM
Sequence number: 200
Event type: 5023
Event category: Command
Priority: Informational
Event needs attention: false
Event send alert: false
Event visibility: true
Description: Controller return status/function call for requested operation
Event specific codes: b8/1/0
Component type: Controller
Component location: Shelf 99, Bay A
Logged by: Controller in bay A
Date/Time: 2/3/20, 12:18:25 PM
Sequence number: 205
Event type: 5023
Event category: Command
Priority: Informational
Event needs attention: false
Event send alert: false
Event visibility: true
Description: Controller return status/function call for requested operation
Event specific codes: b8/1/0
Component type: Controller
Component location: Shelf 99, Bay B
Logged by: Controller in bay B
Date/Time: 2/3/20, 12:18:28 PM
Sequence number: 209
Event type: 5023
Event category: Command
Priority: Informational
Event needs attention: false
Event send alert: false
Event visibility: true
Description: Controller return status/function call for requested operation
Event specific codes: b8/1/0
Component type: Controller
Component location: Shelf 92, Bay B
Logged by: Controller in bay B
Date/Time: 2/3/20, 12:18:25 PM
Sequence number: 205
Event type: 5023
Event category: Command
Priority: Informational
Event needs attention: false
Event send alert: false
Event visibility: true
Description: Controller return status/function call for requested operation
Event specific codes: b8/1/0
Component type: Controller
Component location: Shelf 99, Bay B
Logged by: Controller in bay B

我的回答基於@choroba 的好正則表達式,但我認為這個更簡單:

my $key = 'sequence';  #or other fields
my $keep = 'first';    #or 'last' record with identical $key

my $regex = qr{
   Date/Time:              \s* (?<date>.*)
  |Sequence\ number:       \s* (?<sequence>\d+)
  |Event\ type:            \s* (?<type>[0-9|a-f|A-F]{1,6})
  |Event\ category:        \s* (?<category>\w+)
  |Priority:               \s* (?<priority>\w+)
  |Description:            \s* (?<description>.*)
  |Event\ specific\ codes: \s* (?<code>.*)
  |Component\ location:    \s* (?<location>.*)
  |Logged\ by:             \s* (?<logged_by>.*)
  |4[dD]\s45\s4[cC]\s48\s(?<version>\d\d)
}x;

my @event=();
while (<>) {
  m{^Date/Time:} and push @event, {};
  m{^$regex}     and @{$event[-1]}{keys %+} = values %+;
}

#special treatment for type and version: hex and lc
exists $$_{type}    and $$_{type}    = hex $$_{type}    for @event;
exists $$_{version} and $$_{version} = lc  $$_{version} for @event;

#mark for deletion
my %exists; $exists{$$_{$key}}++ and $$_{delete}=1
   for $keep eq 'first' ? @event
     : $keep eq 'last'  ? reverse(@event)
     : die "keep must be first or last";

#delete those marked
@event = grep !$$_{delete}, @event;

#sort by $key
@event = sort { $$a{$key} <=> $$b{$key} } @event;

我猜那個類型應該是hex ,版本應該是lc ed,而不是像問題中的相反。

像這樣運行:

perl script.pl input_file

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM